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SYNTHETIC NUCLEIC ACID MOLECULE COMPOSITIONS AND 
METHODS OF PREPARATION 

Statement of Government Rig hts 

5 The invention was made at least in part with a grant from the Government of 

the United States of America (grant DMI-9402762 from the National Science 
Foundation). The Government has certain rights to the invention. 

Background of the Invention 

10 Transcription, the synthesis of an RNA molecule from a sequence of DNA is 

the first step in gene expression. Sequences which regulate DNA transcription 
include promoter sequences, polyadenylation signals, transcription factor binding 
sites and enhancer elements. A promoter is a DNA sequence capable of specific 
initiation of transcription and consists of three general regions. The core promoter is 

15 the sequence where the RNA polymerase and its cofactors bind to the DNA. 

Immediately upstream of the core promoter is the proximal promoter which contains 
several transcription factor binding sites that are responsible for the assembly of an 
activation complex that in turn recruits the polymerase complex. The distal 
promoter, located further upstream of the proximal promoter also contains 

20 transcription factor binding sites. Transcription termination and polyadenylation, 
hke transcription initiation, are site specific and encoded by defined sequences. 
Enhancers are regulatory regions, containing multiple transcription factor binding 
sites, that can significantly increase the level of transcription from a responsive 
promoter regardless of the enhancer's orientation and distance with respect to the 

25 promoter as long as the enhancer and promoter are located within the same DNA 
molecule. The amount of transcript produced from a gene may also be regulated by 
a post-transcriptional mechanism, the most important being RNA splicing that 
removes intervening sequences (introns) from a primary transcript between splice 
donor and splice acceptor sequences. 



Natural selection is the hypothesis that genotype-environment interactions 
occurring at the phenotypic level lead to differential reproductive success of 
individuals and therefore to modification of the gene pool of a population. 
Some properties of nucleic acid molecules that are acted upon by natural selection 
5 include codon usage frequency, RNA secondary structure, the efficiency of intron 
splicing, and interactions with transcription factors or other nucleic acid binding 
proteins. Because of the degenerate nature of the genetic code, these properties can 
be optimized by natural selection without altering the corresponding amino acid 
sequence. 

10 Under some conditions, it is useful to synthetically alter the natural 

nucleotide sequence encoding a polypeptide to better adapt the polypeptide for 
alternative applications. A common example is to alter the codon usage frequency 
of a gene when it is expressed in a foreign host cell. Although redundancy in the 
genetic code allows amino acids to be encoded by multiple codons, different 

15 organisms favor some codons over others. It has been found that the efficiency of 
protein translation in a non-native host cell can be substantially increased by 
adjusting the codon usage frequency but maintaining the same gene product (U.S. 
Patent Nos. 5,096,825, 5,670,356, and 5,874,304). 

However, altering codon usage may, in tum, result in the unintentional 

20 introduction into a synthetic nucleic acid molecule of inappropriate transcription 
regulatory sequences. This may adversely effect transcription, resulting in 
anomalous expression of the synthetic DNA. Anomalous expression is defined as 
departure from normal or expected levels of expression. For example, transcription 
factor binding sites located downstream from a promoter have been demonstrated to 

25 effect promoter activity (Michael et al., 1990; Lamb et al., 1998; Johnson et al., 
1998; Jones et al, 1997). Additionally, it is not uncommon for an enhancer 
element to exert activity and result in elevated levels of DNA transcription in the 
absence of a promoter sequence or for the presence of transcription regulatory 
sequences to increase the basal levels of gene expression in the absence of a 

30 promoter sequence. 
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Thus, what is needed is a method for making synthetic nucleic acid 
molecules with altered codon usage without also introducing inappropriate or 
unintended transcription regulatory sequences for expression in a particular host 
cell, 

5 

Summary of the Invention 

The invention provides a synthetic nucleic acid molecule comprising at least 
300 nucleotides of a coding region for a polypeptide, having a codon composition 
differing at more than 25% of the codons from a wild type nucleic acid sequence 

10 encoding a polypeptide, and having at least 3 -fold fewer, preferably at least 5-fold 
fewer, transcription regulatory sequences than would result if the differing codons 
were randomly selected. Preferably, the synthetic nucleic acid molecule encodes a 
polypeptide that has an amino acid sequence that is at least 85%, preferably 90%, 
and most preferably 95% or 99% identical to the amino acid sequence of the 

1 5 naturally-occurring (native or wild type) polypeptide (protein) from which it is 
derived. Thus, it is recognized that some specific amino acid changes may also be 
desirable to alter a particular phenotypic characteristic of the polypeptide encoded 
by the synthetic nucleic acid molecule. Preferably, the amino acid sequence identity 
is over at least 100 contiguous amino acid residues. In one embodiment of the 

20 invention, the codons in the synthetic nucleic acid molecule that differ preferably 
encode the same amino acids as the corresponding codons in the wild type nucleic 
acid sequence. 

The transcription regulatory sequences which are reduced in the synthetic 
nucleic acid molecule include, but are not limited to, any combination of 
25 transcription factor binding sequences, intron splice sites, poly(A) addition sites, 

enhancer sequences and promoter sequences. Transcription regulatory sequences are 
well known in the art. 

It is preferred that the synthetic nucleic acid molecule of the invention has a 
codon composition that differs from that of the wild type nucleic acid sequence at 
30 more than 30%, 35%, 40% or more than 45%, e.g., 50%, 55%, 60% or more of the 
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codons. Preferred codons for use in the invention are those which are employed 
more frequently than at least one other codon for the same amino acid in a particular 
organism and, more preferably, are also not low-usage codons in that organism and 
are not low-usage codons in the organism used to clone or screen for the expression 

5 of the synthetic nucleic acid molecule (for example, E. coli). Moreover, preferred 
codons for certain amino acids (i.e., those amino acids that have three or more 
codons,), may include two or more codons that are employed more frequently than 
the other (non-preferred) codon(s). The presence of codons in the synthetic nucleic 
acid molecule that are employed more frequently in one organism than in another 

10 organism results in a synthetic nucleic acid molecule which, when introduced into 
the cells of the organism that employs those codons more frequently, is expressed in 
those cells at a level that is greater than the expression of the wild type or parent 
nucleic acid sequence in those cells. For example, the synthetic nucleic acid 
molecule of the invention is expressed at a level that is at least about 110%, e.g., 

15 150%, 200%, 500% or more (1000%, 5000%, or 10000%) of that of the wild type 
nucleic acid sequence in a cell or cell extract under identical conditions (such as cell 
culture conditions, vector backbone, and the like). 

In one embodiment of the invention, the codons that are different are those 
employed more frequently in a mammal, while in another embodiment the codons 

20 that are different are those employed more frequently in a plant. A particular type of 
mammal, e.g., human, may have a different set of preferred codons than another 
type of mammal. Likewise, a particular type of plant may have a different set of 
preferred codons than another type of plant. In one embodiment of the invention, 
the majority of the codons which differ are ones that are preferred codons in a 

25 desired host cell. Preferred codons for mammals (e.g., humans) and plants are 
known to the art (e.g., Wada et al, 1990). For example, preferred human codons 
include, but are not limited to, CGC (Arg), CTG (Leu), TCT (Ser), AGC (Ser), ACC 
(Thr), CCA (Pro), CCT (Pro), GCC (Ala), GGC (Gly), GTG (Val), ATC (He), ATT 
(He), AAG (Lys), AAC (Asn), CAG (Gin), CAC (His), GAG (Glu), GAC (Asp), 

30 TAC (Tyr), TGC (Cys) and TTC (Phe) (Wada et al., 1990). Thus, preferred 
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"humanized" synthetic nucleic acid molecules of the invention have a codon 
composition which differs from a wild type nucleic acid sequence by having an 
increased number of the preferred human codons, e.g. CGC, CTG, TCT, AGC, 
ACC, CCA, CCT, GCC, GGC, GTG, ATC, ATT, AAG, AAC, CAG, CAC, GAG, 
5 GAC, TAC, TGC, TTC, or any combination thereof For example, the synthetic 
nucleic acid molecule of the invention may have an increased number of CTG or 
TTG leucine-encoding codons, GTG or GTC valine-encoding codons, GGC or GGT 
glycine-encoding codons, ATC or ATT isoleucine-encoding codons, CCA or CCT 
proline-encoding codons, CGC or CGT arginine-encoding codons, AGC or TCT 

10 serine-encoding codons, ACC or ACT threonine-encoding codon, GCC or GCT 
alanine-encoding codons, or any combination thereof, relative to the wild type 
nucleic acid sequence. Similarly, synthetic nucleic acid molecules having an 
increased number of codons that are employed more frequently in plants, have a 
codon composition which differs from a wild type or parent nucleic acid sequence 

15 by having an increased number of the plant codons including, but not limited to, 
CGC (Arg), CTT (Leu), TCT (Ser), TCC (Ser), ACC (Thr), CCA (Pro), CCT (Pro), 
GCT (Ser), GGA (Gly), GTG (Val), ATC (He), ATT (He), AAG (Lys), AAC (Asn), 
CAA (Gin), CAC (His), GAG (Glu), GAC (Asp), TAC (Tyr), TGC (Cys), TTC 
(Phe), or any combination thereof (Murray et al., 1989). Preferred codons may 

20 differ for different types of plants (Wada et al., 1990). 

The choice of codon may be influenced by many factors such as, for 
example, the desire to have an increased number of nucleotide substitutions or 
decreased number of transcription regulatory sequences. Under some circumstances 
(e.g. to permit removal of a transcription factor binding site) it may be desirable to 

25 replace a non-preferred codon with a codon other than a preferred codon or a codon 
other than the most preferred codon. Under other circumstances, for example, to 
prepare codon distinct versions of a synthetic nucleic acid molecule, preferred codon 
pairs are selected based upon the largest number of mismatched bases, as well as the 
criteria described above. 
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The presence of codons in the synthetic nucleic acid molecule that are 
employed more frequently in one organism than in another organism, results in a 
synthetic nucleic acid molecule which, when introduced into a cell of the organism 
that employs those codons, is expressed in that cell at a level which is greater than 

5 the level of expression of the wild type or parent nucleic acid sequence. 

A synthetic nucleic acid molecule of the invention may encode a selectable 
marker protein or a reporter molecule. However, the invention applies to any gene 
and is not Hmited to synthetic reporter genes or synthetic selectable marker genes. 
In one embodiment of a synthetic nucleic acid molecule of the invention that is a 

10 reporter molecule, the synthetic nucleic acid molecule encodes a luciferase having a 
codon composition different than that of a wild type or parent Renilla luciferase or a 
beetle luciferase nucleic acid sequence. A synthetic click beetle luciferase nucleic 
acid molecule of the invention may optionally encode the amino acid valine at 
position 224 (i.e., it emits green Ught), or may optionally encode the amino acid 

15 histidine at position 224, histidine at position 247, isoleucine at position 346, 

glutamine at position 348 or combination thereof (i.e., it emits red Ught). Preferred 
synthetic luciferase nucleic acid molecules that are related to a wild type Renilla 
luciferase nucleic acid sequence include, but are not limited to, SEQ ID N0:21 
(Rlucver2) or SEQ ID NO:22 (Rluc-final). Preferred synthetic luciferase nucleic 

20 acid molecules that are related to click beetle luciferase nucleic acid sequences 

include, but are not limited to, SEQ ID N0:7 (GRverS), SEQ ID N0:8 (GR6), SEQ 
ID N0:9 (GRverS.l), SEQ ID N0:14 (RDverS), SEQ ID N0:15 (RD7), SEQ ID 
N0:16 (RDverS.l), SEQ ID N0:17 (RDver5.2) or SEQ ID N0:18 (RD156-1H9). 
The invention also provides an expression cassette. The expression cassette 

25 of the invention comprises a synthetic nucleic acid molecule of the invention 

operatively linked to a promoter that is functional in a cell. Preferred promoters are 
those functional in mammalian cells and those functional in plant cells. Optionally, 
the expression cassette may include other sequences, e.g., restriction enzyme 
recognition sequences and a Kozak sequence, and be a part of a larger 
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polynucleotide molecule such as a plasmid^ cosmid, artificial chromosome or vector, 
e.g., a viral vector. 

Also provided is a host cell comprising the synthetic nucleic acid molecule 
of the invention, an isolated polypeptide (e.g., a fusion polypeptide encoded by the 
5 synthetic nucleic acid molecule of the invention), and compositions and kits 

comprising the synthetic nucleic acid molecule of the invention or the polypeptide 
encoded thereby in suitable container means and, optionally, instruction means. 
Preferred isolated polypeptides include, but are not limited to, those comprising 
SEQ ID N0:31 (GRverS.l), SEQ ID NO:226 (Rluc-fmal), or SEQ ID NO:223 

10 (RD156-1H9). 

The invention also provides a method to prepare a synthetic nucleic acid 
molecule of the invention by genetically altering a parent (either a wild type or 
another synthetic) nucleic acid sequence. The method may be used to prepare a 
synthetic nucleic acid molecule encoding a polypeptide comprising at least 100 

15 amino acids. One embodiment of the invention is directed to the preparation of 
synthetic genes encoding reporter or selectable marker proteins. The method of the 
invention may be employed to alter the codon usage frequency and decrease the 
number of transcription regulatory sequences in any open reading frame or to 
decrease the number of transcription regulatory sites in a vector backbone. 

20 Preferably, the codon usage frequency in the synthetic nucleic acid molecule is 

altered to reflect that of the host organism desired for expression of that nucleic acid 
molecule while also decreasing the number of potential transcription regulatory 
sequences relative to the parent nucleic acid molecule. 

Thus, the invention provides a method to prepare a synthetic nucleic acid 

25 molecule comprising an open reading frame. The method comprises altering (e.g., 
decreasing or eliminating) a plurahty of transcription regulatory sequences in a 
parent (wild type or a synthetic) nucleic acid sequence that encodes a polypeptide 
having at least 100 amino acids to yield a synthetic nucleic acid molecule which has 
a decreased number of transcription regulatory sequences and which preferably 

30 encodes the same amino acids as the parent nucleic acid molecule. The transcription 
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regulatory sequences are selected from the group consisting of transcription factor 
binding sequences, intron splice sites, poly(A) addition sites, enhancer sequences 
and promoter sequences, and the resulting synthetic nucleic acid molecule has at 
least 3-fold fewer, preferably 5-fold fewer, transcription regulatory sequences 
5 relative to the parent nucleic acid sequence. The method also comprises altering 
greater than 25% of the codons in the synthetic nucleic acid sequence which has a 
decreased number of transcription regulatory sequences to yield a further synthetic 
nucleic acid molecule, wherein the codons that are altered encode the same amino 
acids as those in the corresponding position in the synthetic nucleic acid molecule 

10 which has a decreased number of transcription regulatory sequences and/or in the 
parent nucleic acid sequence. Preferably, the codons which are altered do not result 
in an increase in transcriptional regulatory sequences. Preferably, the further 
synthetic nucleic acid molecule encodes a polypeptide that has at least 85%, 
preferably 90%, and most preferably 95% or 99% contiguous amino acid sequence 

15 identity to the amino acid sequence of the polypeptide encoded by the parent nucleic 
acid sequence. 

Alternatively, the method comprises altering greater than 25% of the codons 
in a parent nucleic acid sequence which encodes a polypeptide having at least 
100 amino acids to yield a codon-altered synthetic nucleic acid molecule, wherein 

20 the codons that are altered encode the same amino acids as those present in the 
corresponding positions in the parent nucleic acid sequence. Then, a plurality of 
transcription regulatory sequences in the codon-altered synthetic nucleic acid 
molecule are altered to yield a further synthetic nucleic acid molecule. Preferably, 
the codons which are altered do not result in an increase in transcriptional regulatory 

25 sequences. Also, preferably, the further synthetic nucleic acid molecule encodes a 
polypeptide that has at least 85%o, preferably 90%, and most preferably 95% or 99% 
contiguous amino acid sequence identity to the amino acid sequence of the 
polypeptide encoded by the parent nucleic acid sequence. Also provided is a 
synthetic (including a further synthetic) nucleic acid molecule prepared by the 

30 methods of the invention. 
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As described hereinbelow, the methods of the invention were employed with 
cUck beetle luciferase and Renilla luciferase nucleic acid sequences. While both of 
these nucleic acid molecules encode luciferase proteins, they are from entirely 
different families and are widely separated evolutionarily. These proteins have 
5 unrelated amino acid sequences, protein structures, and they utihze dissimilar 
chemical substrates. The fact that they share the name "luciferase" should not be 
interpreted to mean that they are from the same family, or even largely similar 
families. The methods produced synthetic luciferase nucleic acid molecules which 
exhibited significantly enhanced levels of mammalian expression without negatively 

10 effecting other desirable physical or biochemical properties (including protein half- 
hfe) and which were also largely devoid of known transcription regulatory elements. 

The invention also provides at least two synthetic nucleic acid molecules that 
encode highly related polypeptides, but which synthetic nucleic acid molecules have 
an increased number of nucleotide differences relative to each other. These 

15 differences decrease the recombination frequency between the two S3nithetic nucleic 
acid molecules when those molecules are both present in a cell (i.e., they are "codon 
distinct" versions of a synthetic nucleic acid molecule). Thus, the invention 
provides a method for preparing at least two synthetic nucleic acid molecules that 
are codon distinct versions of a parent nucleic acid sequence that encodes a 

20 polypeptide. The method comprises altering a parent nucleic acid sequence to yield 
a first synthetic nucleic acid molecule having an increased number of a first plurality 
of codons that are employed more frequently in a selected host cell relative to the 
number of those codons present in the parent nucleic acid sequence. Optionally, the 
first synthetic nucleic acid molecule also has a decreased number of transcription 

25 regulatory sequences relative to the parent nucleic acid sequence. The parent 
nucleic acid sequence is also altered to yield a second synthetic nucleic acid 
molecule having an increased number of a second pluraUty of codons that are 
employed more frequently in the host cell relative to the nimiber of those codons in 
the parent nucleic acid sequence, wherein the first plurality of codons is different 

30 than the second plurality of codons, and wherein the first and the second synthetic 
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nucleic acid molecules preferably encode the same polypeptide. Optionally, the 
second synthetic nucleic acid molecule has a decreased number of transcription 
regulatory sequences relative to the parent nucleic acid sequence. Either or both 
synthetic molecules can then be further modified, 
5 Clearly, the present invention has applications with many genes and across 

many fields of science including, but not limited to, Ufe science research, 
agrigenetics, genetic therapy, developmental science and pharmaceutical 
development, 

10 Brief Description of the Figures 

Figure 1 . Codons and their corresponding amino acids. 

Figure 2. A nucleotide sequence comparison of a yellow-green (YG) click 

beetle luciferase nucleic acid sequence (YG #81-6G01; SEQ ID N0:2) and various 

synthetic green (GR) click beetle luciferase nucleic acid sequences (GRverl, SEQ 
15 ID N0:3; GRver2, SEQ ID N0:4; GRverS, SEQ ID N0:5; GRver4, SEQ ID N0:6; 

GRverS, SEQ ID N0:7; GR6, SEQ ID N0:8; GRver5.1, SEQ ID N0:9) and various 

red (RD) click beetle luciferase nucleic acid sequences (RDverl, SEQ ID NO: 10; 

RDver2, SEQ ID N0:1 1; RDverS, SEQ ID N0:12; RDver4, SEQ ID N0:13; 

RDverS, SEQ ID N0:14; RD7, SEQ ID N0:15; RDverS.l, SEQ ID N0:16; 
20 RDver5.2, SEQ ID N0:17; RD156-1H9, SEQ ID N0:18). The nucleotides enclosed 

in boxes are nucleotides that differ from the nucleotide present at the homologous 

position in SEQ ID N0:2. 

Figure 3. An amino acid sequence comparison of a YG click beetle 

luciferase amino acid sequence (YG#81-6G01, SEQ ID NO:24) and various 
25 synthetic GR click beetle luciferase amino acid sequences (GRverl, SEQ ID NO:25; 

GRver2, SEQ ID NO:26; GRver3, SEQ ID NO:27; GRver4, SEQ ID NO:28; 

GRverS, SEQ ID NO:29; GR6, SEQ ID NO:30; GRver5,l, SEQ ID N0:31) and 

various red (RD) chck beetle luciferase amino acid sequences (RDverl, SEQ ID 

NO:32; RDver2, SEQ ID NO:33; RDver3, SEQ ID NO:34; RDver4, SEQ ID 
30 NO:218; RDverS, SEQ ID NO:219; RD7, SEQ ID NO:220; RDverS.l, SEQ ID 
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NO:221; RDver5.2, SEQ ID NO:222; RD156-1H9, SEQ ID NO:223). All amino 
acid sequences are inferred from the corresponding nucleotide sequence. The amino 
acids enclosed in boxes are amino acids that differ from the amino acid present at 
the homologous position in SEQ ED NO:24. 
5 Figure 4. Codon usage in YG#81-6G01, GRverl, RDverl, GRverS, and 

RDver5, and humans (HUM) and relative codon usage in YG#81-6G01, GRverS, 
RDverS, and humans. 

Figure 5. Codon usage summaries for YG#81-6G01 (Figure 5 A), and 
GR/RD synthetic nucleic acid sequences, GRverl (Figure 5B), RDverl (Figure 5C), 

10 GRver2 (Figure 5D), RDver2 (Figure 5E), GRver3 (Figure 5F), RDver3 (Figure 
5G), GRver4 (Figure 5H), RDver4 (Figure 51), GRverS (Figure 5J), RDverS (5K). 

Figure 6. Oligonucleotides employed to prepare synthetic GR/RD luciferase 
genes (SEQ ID Nos. 35-245). 

Figure 7. A nucleotide sequence comparison of a wild type Renilla 

15 reniformis luciferase nucleic acid sequence Genbank Accession No. M63501 

(RELLUC, SEQ ID NO: 19) and various synthetic Renilla luciferase nucleic acid 
sequences (Rlucverl, SEQ ID NO:20; Rlucver2, SEQ ID N0:21; Rluc-fmal, SEQ 
ID NO:22). The nucleotides enclosed in boxes are nucleotides that differ from the 
nucleotide present at the homologous position in SEQ ID NO: 19. 

20 Figure 8. An amino acid sequence comparison of a wild type Renilla 

reniformis luciferase amino acid sequence (RELLUC, SEQ ID NO:224) and various 
synthetic Renilla reniformis luciferase amino acid sequences (Rlucverl, SEQ ID 
NO:225; Rlucver2, SEQ ID NO:226; Rluc-fmal, SEQ ID NO:227). All amino acid 
sequences are inferred from the corresponding nucleotide sequence. The amino 

25 acids enclosed in boxes are amino acids that differ from the amino acid present at 
the homologous position in SEQ ID NO:224. 

Figure 9. Codon usage in wild-type (A) versus synthetic (B) Renilla 
luciferase genes. For codon usage in selected organisms, see, e.g., Wada et al., 
1990; Shaip et al., 1988; Aota et al., 1988; and Sharp et al., 1987, and for plant 

30 codons, Murray et al 1989. 
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Figure 10. Oligonucleotides employed to prepare synthetic Renilla 
luciferase gene (SEQ ID Nos. 246-292). 

Figure 1 L A nucleotide sequence comparison of a wild type yellow-green 
(YG) click beetle luciferase nucleic acid sequence (LUCPPLYG, SEQ ID N0:1) 
5 and the synthetic green click beetle luciferase nucleic acid sequences (GRverS. 1 , 
SEQ ID N0:9) and the synthetic red click beetle luciferase nucleic acid sequences 
(RD156-1H9, SEQ ED NO: 18). The nucleotides enclosed in boxes are nucleotides 
that differ from the nucleotide present at the homologous position in SEQ ID N0:1. 
Both synthetic sequences have a codon composition that differs from LUCPPLYG 
10 at more than 25% of the codons and have at least 3-fold fewer transcription 

regulatory sequences relative to a random selection of codons at the codons which 
differ. 

Figure 12. An amino acid sequence comparison of a wild type YG click 
beetle luciferase amino acid sequence (LUCPPLYG, SEQ ID NO:23) and the 

1 5 synthetic GR cKck beetle luciferase amino acid sequences (GRverS . 1 , SEQ ID 

N0:31) and the red (RD) click beetle luciferase amino acid sequences (RD156-1H9, 
SEQ ID N0:223). All amino acid sequences are inferred from the corresponding 
nucleotide sequence. The amino acids enclosed in boxes are amino acids that differ 
from the amino acid present at the homologous position in SEQ ID NO:23. 

20 Figure 13. pRL vector series. All of the vectors contain the Renilla wild 

type or synthetic gene as fiirther described herein. Figure 13 A illustrates the Renilla 
luciferase gene in the pGL3 vectors (Promega Corp.) Figure 13B illustrates the 
Renilla luciferase co-reporter vector series. pRL-TK has the herpes simplex virus 
(HSV) tk promoter; pRL-SV40 has the SV40 virus early enhancer/promoter; pRL- 

25 CMV has the cytomegalovirus (CMV) enhancer and immediate early promoter; 
pRL-nuU has MCS (multiple cloning sites) but no promoter or enhancer; pRL- 
TK(Int ") has HSV/tk promoter without an intron that is present in the other 
plasmids; pR-GL3B has the pGL-3 Basic backbone (Promega Corp.); pR-GL3 TK 
has the pGL3-Basic backbone with an HSV tk promoter. 
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Figure 14. Half-life of synthetic (Rluc-final) and native Renilla luciferases 
in CHO cells. 

Figures 15A-B. In vitro transcription/translation of Renilla luciferase 
nucleic acid sequences. A) t = 0-60 minutes; B) linear range. 
5 Figures 15C-D. In vitro translation of native and synthetic (Rluc-final) 

Renilla luciferase RNAs in a rabbit reticulocyte lysate. RNA was quantitated and 
the same amount was employed as in the translation reaction shown in Figures 15A- 
B. C) t = 0-60 minutes; D) linear range. 

Figures 15E-F. Translation of native and synthetic (Rluc-final) Renilla 
10 RNAs in a wheat germ extract. E) t = 0-60 minutes; F) linear range. 

Figure 16. High expression fi*om a synthetic Renilla nucleic acid sequence 
reduces the risk of promoter interference in a co-transfection assay. CHO cells were 
co-transfected with a constant amount (50 ng) of firefly luciferase expression vector 
(pGL3 control vector, with S V40 promoter and enhancer; Luc+) and a pRL vector 
15 having a native (0 ng, 50 ng, 100 ng, 500 ng, 1 |ig or 2 |Lig) or synthetic (0 ng, 5 ng, 
10 ng, 50 ng, 100 ng or 200 ng) Renilla luciferase gene. 

Figures 17A-B. Illustrates the reactions catalyzed by firefly and click beetle 
(17A), md Renilla (17B) luciferases. 

Figure 18. Nucleotide and inferred amino acid sequence of click beetle 
20 luciferases in pGL3 vectors (GRver5. 1 in pGL3, SEQ ID NO:297 encoding SEQ ID 
NO:298; RDver5.1 in pGL3, SEQ ID NO:299 encoding SEQ ID NO:300; and 
RD156-1H9 in pGL3, SEQ ID NO:301 encoding SEQ ID NO:302), To clone 
GRverS.l, RDver5.1, and RD156-1H9 nucleic acid sequences into pGL3 vectors, an 
oligonucleotide having an Nco I site at the initiation codon was employed, which 
25 resulted in an amino acid substitution at position 2 to vaUne. 

Detailed Description of the Invention 

Definitions 

The term "gene" as used herein, refers to a DNA sequence that comprises 
30 coding sequences necessary for the production of a polypeptide or protein precursor. 
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The polypeptide can be encoded by a full length coding sequence or by any portion 
of the coding sequence, as long as the desired protein activity is retained. 

A "nucleic acid", as used herein, is a covalently linked sequence of 
nucleotides in which the 3' position of the pentose of one nucleotide is joined by a 
5 phosphodiester group to the 5' position of the pentose of the next, and in which the 
nucleotide residues (bases) are linked in specific sequence, i.e., a linear order of 
nucleotides. A "polynucleotide", as used herein, is a nucleic acid containing a 
sequence that is greater than about 100 nucleotides in length. An "oligonucleotide", 
as used herein, is a short polynucleotide or a portion of a polynucleotide. An 

10 oligonucleotide typically contains a sequence of about two to about one hundred 
bases. The word "oligo" is sometimes used in place of the word "oligonucleotide". 

Nucleic acid molecules are said to have a "5'-terminus" (5' end) and a 
"3'-terminus" (3' end) because nucleic acid phosphodiester linkages occur to the 5' 
carbon and 3' carbon of the pentose ring of the substituent mononucleotides. The 

15 end of a polynucleotide at which a new linkage would be to a 5' carbon is its 5' 

terminal nucleotide. The end of a polynucleotide at which a new linkage would be 
to a 3' carbon is its 3' terminal nucleotide. A terminal nucleotide, as used herein, is 
the nucleotide at the end position of the 3'- or 5'-terminus. 

DNA molecules are said to have "5' ends" and "3' ends" because 

20 mononucleotides are reacted to make oligonucleotides in a manner such that the 5' 
phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of its 
neighbor in one direction via a phosphodiester linkage. Therefore, an end of an 
oligonucleotides referred to as the "5' end" if its 5' phosphate is not linked to the 3' 
oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' oxygen is not 

25 linked to a 5' phosphate of a subsequent mononucleotide pentose ring. 

As used herein, a nucleic acid sequence, even if internal to a larger 
oligonucleotide or polynucleotide, also may be said to have 5' and 3' ends. In either 
a linear or circular DNA molecule, discrete elements are referred to as being 
"upstream" or 5' of the "downstream" or 3' elements. This terminology reflects the 
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fact that transcription proceeds in a 5' to 3' fashion along the DNA strand. 
Typically, promoter and enhancer elements that direct transcription of a linked gene 
are generally located 5' or upstream of the coding region. However, enhancer 
elements can exert their effect even when located 3' of the promoter element and the 

5 coding region. Transcription termination and polyadenylation signals are located 3' 
or downstream of the coding region. 

The term "codon" as used herein, is a basic genetic coding unit, consisting of 
a sequence of three nucleotides that specify a particular amino acid to be 
incorporation into a polypeptide chain, or a start or stop signal. Figure 1 contains a 

10 codon table. The term "coding region" when used in reference to structural gene 
refers to the nucleotide sequences that encode the amino acids found in the nascent 
polypeptide as a result of translation of a mKNfA molecule. Typically, the coding 
region is bounded on the 5' side by the nucleotide triplet "ATG" which encodes the 
initiator methionine and on the 3' side by a stop codon (e.g., TAA, TAG, TGA). In 

1 5 some cases the coding region is also known to initiate by a nucleotide triplet "TTG". 
By "protein" and "polypeptide" is meant any chain of amino acids, 
regardless of length or post-translational modification (e.g., glycosylation or 
phosphorylation). The synthetic genes of the invention may also encode a variant of 
a naturally-occurring protein or polypeptide fragment thereof. Preferably, such a 

20 protein polypeptide has an amino acid sequence that is at least 85%, preferably 90%, 
and most preferably 95% or 99% identical to the amino acid sequence of the 
naturally-occurring (native) protein from which it is derived. 

Polypeptide molecules are said to have an "amino terminus" (N-terminus) 
and a "carboxy terminus" (C-terminus) because peptide linkages occur between the 

25 backbone amino group of a first amino acid residue and the backbone carboxyl 

group of a second amino acid residue. The terms "N-terminal" and "C-terminal" in 
reference to polypeptide sequences refer to regions of polypeptides including 
portions of the N-terminal and C-terminal regions of the polypeptide, respectively. 
A sequence that includes a portion of the N-terminal region of polypeptide includes 

30 amino acids predominantly from the N-terminal half of the polypeptide chain, but is 
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not limited to such sequences. For example, an N-terminal sequence may include an 
interior portion of the polypeptide sequence including bases from both the 
N-terminal and C-terminal halves of the polypeptide. The same applies to 
C-terminal regions. N-terminal and C-terminal regions may, but need not, include 
5 the amino acid defining the ultimate N-terminus and C-terminus of the polypeptide, 
respectively. 

The term "wild type" as used herein, refers to a gene or gene product that has 
the characteristics of that gene or gene product isolated from a naturally occurring 
source. A wild type gene is that which is most frequently observed in a population 

10 and is thus arbitrarily designated the "wild type" form of the gene. In contrast, the 
term "mutant" refers to a gene or gene product that displays modifications in 
sequence and/or functional properties (i.e., altered characteristics) when compared to 
the wild type gene or gene product. It is noted that naturally-occurring mutants can 
be isolated; these are identified by the fact that they have altered characteristics 

1 5 when compared to the wild type gene or gene product. 

The terms "complementary" or "complementarity" are used in reference to a 
sequence of nucleotides related by the base-pairing rules. For example, for the 
sequence 5' "A-G-T" 3', is complementary to the sequence 3' "T-C-A" 5'. 
Complementarity may be "partial," in which only some of the nucleic acids' bases 

20 are matched according to the base pairing rules. Or, there may be "complete" or 

"total" complementarity between the nucleic acids. The degree of complementarity 
between nucleic acid strands has significant effects on the efficiency and strength of 
hybridization between nucleic acid strands. This is of particular importance in 
ampUfication reactions, as well as detection methods which depend upon 

25 hybridization of nucleic acids. 

The term "recombinant protein" or "recombinant polypeptide" as used herein 
refers to a protein molecule expressed from a recombinant DNA molecule. In 
contrast, the term "native protein" is used herein to indicate a protein isolated from a 
naturally occurring (i.e., a nonrecombinant) source. Molecular biological techniques 
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may be used to produce a recombinant form of a protein with identical properties as 
compared to the native form of the protein. 

The terms "fusion protein" and "fusion partner" refer to a chimeric protein 
containing the protein of interest (e.g., luciferase) joined to an exogenous protein 
5 fragment (e.g., a fusion partner which consists of a non-luciferase protein). The 

fusion partner may enhance the solubihty of protein as expressed in a host cell, may, 
for example, provide an affinity tag to allow purification of the recombinant fusion 
protein firom the host cell or culture supernatant, or both. If desired, the fusion 
partner may be removed from the protein of interest by a variety of enzymatic or 

10 chemical means known to the art. 

The terms "cell," "cell line," "host cell," as used herein, are used 
interchangeably, and all such designations include progeny or potential progeny of 
these designations. By "transformed cell" is meant a cell into which (or into an 
ancestor of which) has been introduced a DNA molecule comprising a synthetic 

15 gene. Optionally, a synthetic gene of the invention may be introduced into a 

suitable cell line so as to create a stably-transfected cell line capable of producing 
the protein or polypeptide encoded by the synthetic gene. Vectors , cells, and 
methods for constructing such cell lines are well known in the art, e.g. in Ausubel, et 
al. (infra). The words "transformants" or "transformed cells" include the primary 

20 transformed cells derived from the originally transformed cell without regard to the 
number of transfers. All progeny may not be precisely identical in DNA content, 
due to deliberate or inadvertent mutations. Nonetheless, mutant progeny that have 
the same functionality as screened for in the originally transformed cell are included 
in the definition of transformants. 

25 Nucleic acids are known to contain different types of mutations. A "point" 

mutation refers to an alteration in the sequence of a nucleotide at a single base 
position from the wild type sequence. Mutations may also refer to insertion or 
deletion of one or more bases, so that the nucleic acid sequence differs from the 
wild-type sequence. 
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The term "homology" refers to a degree of complementarity. There may be 
partial homology or complete homology (i.e.^ identity). Homology is often 
measured using sequence analysis software (e.g., Sequence Analysis Software 
Package of the Genetics Computer Group. University of Wisconsin Biotechnology 

5 Center. 1710 University Avenue. Madison, WI 53705). Such software matches 
similar sequences by assigning degrees of homology to various substitutions, 
deletions, insertions, and other modifications. Conservative substitutions typically 
include substitutions within the following groups: glycine, alanine; valine, 
isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, 

10 threonine; lysine, arginine; and phenylalanine, tyrosine. 

A "partially complementary" sequence is one that at least partially inhibits a 
completely complementary sequence from hybridizing to a target nucleic acid is 
referred to using the fimctional term "substantially homologous." The inhibition of 
hybridization of the completely complementary sequence to the target sequence may 

15 be examined using a hybridization assay (Southern or Northern blot, solution 
hybridization and the like) under conditions of low stringency. A substantially 
homologous sequence or probe will compete for and inhibit the binding (i.e., the 
hybridization) of a completely homologous to a target under conditions of low 
stringency. This is not to say that conditions of low stringency are such that 

20 non-specific binding is permitted; low stringency conditions require that the binding 
of two sequences to one another be a specific (i.e., selective) interaction. The 
absence of non-specific binding may be tested by the use of a second target which 
lacks even a partial degree of complementarity (e.g., less than about 30% identity). 
In this case, in the absence of non-specific binding, the probe will not hybridize to 

25 the second non-complementary target. 

When used in reference to a double-stranded nucleic acid sequence such as a 
cDNA or a genomic clone, the term "substantially homologous" refers to any probe 
which can hybridize to either or both strands of the double-stranded nucleic acid 
sequence under conditions of low stringency as described herein. 
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"Probe" refers to an oligonucleotide designed to be sufficiently 
complementary to a sequence in a denatured nucleic acid to be probed (in relation to 
its length) to be bound under selected stringency conditions. 

"Hybridization" and "binding" in the context of probes and denature melted 

5 nucleic acid are used interchangeably. Probes which are hybridized or bound to 
denatured nucleic acid are base paired to complementary sequences in the 
polynucleotide. Whether or not a particular probe remains base paired with the 
polynucleotide depends on the degree of complementarity, the length of the probe, 
and the stringency of the binding conditions. The higher the stringency, the higher 

1 0 must be the degree of complementarity and/or the longer the probe. 

The term "hybridization" is used in reference to the pairing of 
complementary nucleic acid strands. Hybridization and the strength of 
hybridization (i.e., the strength of the association between nucleic acid strands) is 
impacted by many factors well known in the art including the degree of 

1 5 complementarity between the nucleic acids, stringency of the conditions involved 
affected by such conditions as the concentration of salts, the Tm (melting 
temperature) of the formed hybrid, the presence of other components (e.g., the 
presence or absence of polyethylene glycol), the molarity of the hybridizing strands 
and the G:C content of the nucleic acid strands. 

20 The term "stringency" is used in reference to the conditions of temperature, 

ionic strength, and the presence of other compounds, under which nucleic acid 
hybridizations are conducted. With "high stringency" conditions, nucleic acid base 
pairing will occur only between nucleic acid fragments that have a high frequency of 
complementary base sequences. Thus, conditions of "medium" or "low" stringency 

25 are often required when it is desired that nucleic acids which are not completely 
complementary to one another be hybridized or annealed together. The art knows 
well that numerous equivalent conditions can be employed to comprise medium or 
low stringency conditions. The choice of hybridization conditions is generally 
evident to one skilled in the art and is usually guided by the purpose of the 

30 hybridization, the type of hybridization (DNA-DNA or DNA-RNA), and the level of 
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desired relatedness between the sequences (e.g., Sambrook et al, 1989; Nucleic 
Acid Hybridization, A Practical Approach, IRL Press, Washington D.C., 1985, for a 
general discussion of the methods). 

The stability of nucleic acid duplexes is known to decrease with an increased 
5 number of mismatched bases, and further to be decreased to a greater or lesser 
degree depending on the relative positions of mismatches in the hybrid duplexes. 
Thus, the stringency of hybridization can be used to maximize or minimize stabihty 
of such duplexes. Hybridization stringency can be ahered by: adjusting the 
temperature of hybridization; adjusting the percentage of heUx destabihzing agents, 

10 such as formamide, in the hybridization mix; and adjusting the temperature and/or 
salt concentration of the wash solutions. For filter hybridizations, the final 
stringency of hybridizations often is determined by the salt concentration and/or 
temperature used for the post-hybridization washes. 

"High stringency conditions" when used in reference to nucleic acid 

15 hybridization comprise conditions equivalent to binding or hybridization at 42°C in 
a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04 H^O and 1.85 g/1 
EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardf s reagent and 100 
|ag/ml denatured salmon sperm DNA followed by washing in a solution comprising 
O.IX SSPE, 1.0% SDS at 42°C when aprobe of about 500 nucleotides in length is 

20 employed. 

"Medium stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42°C in 
a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04 H^O and 1.85 g/1 
EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardfs reagent and 100 
25 |Lig/ml denatured salmon sperm DNA followed by washing in a solution comprising 
LOX SSPE, 1.0% SDS at 42''C when aprobe of about 500 nucleotides in length is 
employed. 

"Low stringency conditions" comprise conditions equivalent to binding or 
hybridization at 42X in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 
30 NaH2P04 H2O and 1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0. 1% SDS, 5X 



Denhardt's reagent [SOX Denhardt's contains per 500 ml: 5 g FicoU (Type 400, 
Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured salmon sperm 
DNA followed by washing in a solution comprising 5X SSPE, 0.1% SDS at 42°C 
when a probe of about 500 nucleotides in length is employed. 
5 The term "T^" is used in reference to the "melting temperature". The 

melting temperature is the temperature at which 50% of a population of 
double-stranded nucleic acid molecules becomes dissociated into single strands. 
The equation for calculating the T^^ of nucleic acids is well-known in the art. The 
Tm of a hybrid nucleic acid is often estimated using a formula adopted from 

10 hybridization assays in 1 M salt, and commonly used for calculating Tm for PGR 
primers: [(number of A + T) x 2°C + (number of G+C) x 4°G], (G.R. Newton et al, 
PGR . 2nd Ed., Springer- Verlag (New York, 1997), p. 24). This formula was found 
to be inaccurate for primers longer than 20 nucleotides. (Id.) Another simple 
estimate of the T^^ value may be calculated by the equation: T^ =^ 8L5 + 0.4 1(% G + 

1 5 G), when a nucleic acid is in aqueous solution at 1 M NaGl. (e.g., Anderson and 
Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization . 1985). 
Other more sophisticated computations exist in the art which take structural as well 
as sequence characteristics into account for the calculation of T^^. A calculated T,^ is 
merely an estimate; the optimum temperature is commonly determined empirically. 

20 The term "isolated" when used in relation to a nucleic acid, as in "isolated 

oUgonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence that 
is identified and separated from at least one contaminant with which it is ordinarily 
associated in its source. Thus, an isolated nucleic acid is present in a form or setting 
that is different from that in which it is found in nature. In contrast, non-isolated 

25 nucleic acids (e.g., DNA and RNA) are found in the state they exist in nature. For 
example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome 
in proximity to neighboring genes; RNA sequences (e.g., a specific mRNA sequence 
encoding a specific protein), are found in the cell as a mixture with numerous other 
mRNAs that encode a multitude of proteins. However, isolated nucleic acid 

30 includes, by way of example, such nucleic acid in cells ordinarily expressing that 
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nucleic acid where the nucleic acid is in a chromosomal location different Jfrom that 
of natural cells, or is otherwise flanked by a different nucleic acid sequence than that 
found in nature. The isolated nucleic acid or oligonucleotide may be present in 
single-stranded or double-stranded form. When an isolated nucleic acid or 

5 oligonucleotide is to be utiUzed to express a protein, the oligonucleotide contains at 
a minimum, the sense or coding strand (i.e., the oHgonucleotide may 
single-stranded), but may contain both the sense and anti-sense strands (i.e., the 
oligonucleotide may be double-stranded). 

The term "isolated" when used in relation to a polypeptide, as in "isolated 

10 protein" or "isolated polypeptide" refers to a polypeptide that is identified and 

separated firom at least one contaminant with which it is ordinarily associated in its 
source. Thus, an isolated polypeptide is present in a form or setting that is different 
firom that in which it is found in nature. In contrast, non-isolated polypeptides (e.g., 
proteins and enzymes) are found in the state they exist in nature. 

15 The term "purified" or "to purify" means the result of any process that 

removes some of a contaminant from the component of interest, such as a protein or 
nucleic acid. The percent of a purified component is thereby increased in the 
sample. 

The term "operably linked" as used herein refer to the linkage of nucleic acid 
20 sequences in such a manner that a nucleic acid molecule capable of directing the 
transcription of a given gene and/or the synthesis of a desired protein molecule is 
produced. The term also refers to the linkage of sequences encoding amino acids in 
such a manner that a functional (e.g., enzymatically active, capable of binding to a 
binding partner, capable of inhibiting, etc.) protein or polypeptide is produced. 
25 The term "recombinant DNA molecule" means a hybrid DNA sequence 

comprising at least two nucleotide sequences not normally found together in nature. 

The term "vector" is used in reference to nucleic acid molecules into which 
fi*agments of DNA may be inserted or cloned and can be used to transfer DNA 
segment(s) into a cell and capable of replication in a cell. Vectors may be derived 
30 fi"om plasmids, bacteriophages, viruses, cosmids, and the like. 
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The terms "recombinant vector" and "expression vector" as used herein refer 
to DNA or RNA sequences containing a desired coding sequence and appropriate 
DNA or RNA sequences necessary for the expression of the operably hnked coding 
sequence in a particular host organism. Prokaryotic expression vectors include a 
5 promoter, a ribosome binding site, an origin of replication for autonomous 

replication in a host cell and possibly other sequences, e.g. an optional operator 
sequence, optional restriction enzyme sites. A promoter is defined as a DNA 
sequence that directs RNA polymerase to bind to DNA and to initiate RNA 
synthesis. Eukaryotic expression vectors include a promoter, optionally a 

10 polyadenlyation signal and optionally an enhancer sequence. 

The term "a polynucleotide having a nucleotide sequence encoding a gene," 
means a nucleic acid sequence comprising the coding region of a gene, or in other 
words the nucleic acid sequence which encodes a gene product. The coding region 
may be present in either a cDNA, genomic DNA or RNA form. When present in a 

15 DNA form, the oUgonucleotide may be single-stranded (i.e., the sense strand) or 
double-stranded. Suitable control elements such as enhancers/promoters, spUce 
junctions, polyadenylation signals, etc. may be placed in close proximity to the 
coding region of the gene if needed to permit proper initiation of transcription 
and/or correct processing of the primary RNA transcript. Altematively, the coding 

20 region utiUzed in the expression vectors of the present invention may contain 
endogenous enhancers/promoters, splice junctions, intervening sequences, 
polyadenylation signals, etc. In further embodiments, the coding region may contain 
a combination of both endogenous and exogenous control elements. 

The term "transcription regulatory element" or "transcription regulatory 

25 sequence" refers to a genetic element or sequence that controls some aspect of the 
expression of nucleic acid sequence(s). For example, a promoter is a regulatory 
element that facilitates the initiation of transcription of an operably linked coding 
region. Other regulatory elements include, but are not limited to, transcription 
factor binding sites, sphcing signals, polyadenylation signals, termination signals 

30 and enhancer elements. 
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Transcriptional control signals in eukaryotes comprise "promoter" and 
"enhancer" elements. Promoters and enhancers consist of short arrays of DNA 
sequences that interact specifically with cellular proteins involved in transcription 
(Maniatis et al., 1987). Promoter and enhancer elements have been isolated from a 
5 variety of eukaryotic sources including genes in yeast, insect and mammahan cells. 
Promoter and enhancer elements have also been isolated from viruses and analogous 
control elements, such as promoters, are also found in prokaryotes. The selection of 
a particular promoter and enhancer depends on the cell type used to express the 
protein of interest. Some eukaryotic promoters and enhancers have a broad host 

10 range while others are fimctional in a limited subset of cell types (for review, see 
Voss et al., 1986; and Maniatis et al., 1987. For example, the SV40 early gene 
enhancer is very active in a wide variety of cell types from many mammalian 
species and has been widely used for the expression of proteins in mammahan cells 
(Dijkema et al, 1985). Two other examples of promoter/enhancer elements active 

15 in a broad range of mammalian cell types are those from the human elongation 
factor 1 gene (Uetsuki et al, 1989; Kim, et al, 1990; and Mizushima and Nagata, 
1990) and the long terminal repeats of the Rous sarcoma virus (Gorman et al., 
1982); and the human cytomegalovirus (Boshart et al, 1985). 

The term "promoter/enhancer" denotes a segment of DNA containing 

20 sequences capable of providing both promoter and enhancer functions (i.e., the 
ftinctions provided by a promoter element and an enhancer element as described 
above). For example, the long terminal repeats of retroviruses contain both 
promoter and enhancer frmctions. The enhancer/promoter may be "endogenous" or 
"exogenous" or "heterologous." An "endogenous" enhancer/promoter is one that is 

25 naturally linked with a given gene in the genome. An "exogenous" or 

"heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene by 
means of genetic manipulation (i.e.^ molecular biological techniques) such that 
transcription of the gene is directed by the linked enhancer/promoter. 

The presence of "splicing signals" on an expression vector often results in 

30 higher levels of expression of the recombinant transcript in eukaryotic host cells. 
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Splicing signals mediate the removal of introns from the primary RNA transcript 
and consist of a splice donor and acceptor site (Sambrook, et al., Molecular Cloning: 
A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York , 
1989, pp. 16,7-16.8). A conmionly used splice donor and acceptor site is the splice 
5 junction from the 1 6S RNA of S V40. 

Efficient expression of recombinant DNA sequences in eukaryotic cells 
requires expression of signals directing the efficient termination and 
polyadenylation of the resulting transcript. Transcription termination signals are 
generally found downstream of the polyadenylation signal and are a few hundred 

10 nucleotides in length. The term "poly(A) site" or "poly(A) sequence" as used herein 
denotes a DNA sequence which directs both the termination and polyadenylation of 
the nascent RNA transcript. Efficient polyadenylation of the recombinant transcript 
is desirable, as transcripts lacking a poly(A) tail are unstable and are rapidly 
degraded. The poly(A) signal utilized in an expression vector may be "heterologous" 

15 or "endogenous." An endogenous poly(A) signal is one that is found naturally at the 
3' end of the coding region of a given gene in the genome. A heterologous poly(A) 
signal is one which has been isolated from one gene and positioned 3' to another 
gene. A commonly used heterologous poly(A) signal is the SV40 poly(A) signal 
The SV40 poly(A) signal is contained on a 237 bp BamH lIBcl I restriction fragment 

20 and directs both termination and polyadenylation (Sambrook, supra, at 16.6-16.7). 

Eukaryotic expression vectors may also contain "viral replicons "or "viral 
origins of replication." Viral repUcons are viral DNA sequences which allow for the 
extrachromosomal rephcation of a vector in a host cell expressing the appropriate 
replication factors. Vectors containing either the SV40 or polyoma virus origin of 

25 replication replicate to high copy number (up to 10"^ copies/cell) in cells that express 
the appropriate viral T antigen. In contrast, vectors containing the replicons from 
bovine papillomavirus or Epstein-Barr virus replicate extrachromosomally at low 
copy number (about 100 copies/cell). 

The term "in vitro'' refers to an artificial environment and to processes or 

30 reactions that occur within an artificial environment. In vitro environments include, 



25 



but are not limited to, test tubes and cell lysates. The term "in situ'' refers to cell 
culture. The term "m vivo'' refers to the natural environment (e.g., an animal or a 
cell) and to processes or reaction that occur within a natural environment. 

The term "expression system" refers to any assay or system for determining 

5 (e.g., detecting) the expression of a gene of interest. Those skilled in the field of 
molecular biology will understand that any of a wide variety of expression systems 
may be used. A wide range of suitable mammalian cells are available from a wide 
range of source (e.g., the American Type Culture Collection, Rockland, MD). The 
method of transformation or transfection and the choice of expression vehicle will 

10 depend on the host system selected. Transformation and transfection methods are 
described, e.g., in Ausubel, et al. Current Protocols in Molecular Biology, John 
Wiley & Sons, New York. 1992. Expression systems include in vitro gene 
expression assays where a gene of interest (e.g., a reporter gene) is linked to a 
regulatory sequence and the expression of the gene is monitored following treatment 

15 with an agent that inhibits or induces expression of the gene. Detection of gene 
expression can be through any suitable means including, but not limited to, 
detection of expressed mRNA or protein (e.g., a detectable product of a reporter 
gene) or through a detectable change in the phenotype of a cell expressing the gene 
of interest. Expression systems may also comprise assays where a cleavage event or 

20 other nucleic acid or cellular change is detected. 

The term "enzyme" refers to molecules or molecule aggregates that are 
responsible for catalyzing chemical and biological reactions. Such molecules are 
typically proteins, but can also comprise short peptides, RNAs, ribozymes, 
antibodies, and other molecules. A molecule that catalyzes chemical and biological 

25 reactions is referred to as "having enzyme activity" or "having catalytic activity." 

All amino acid residues identified herein are in the natural L-configuration. 
In keeping with standard polypeptide nomenclature (see J. Biol. Chem. . 243 . 3557 
(1969)), abbreviations for amino acid residues are as shown in the following Table 
of Correspondence. 

30 
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The term "sequence homology" means the proportion of base matches between 
25 two nucleic acid sequences or the proportion of amino acid matches between two amino 
acid sequences. When sequence homology is expressed as a percentage, e.g., 50%, the 
percentage denotes the proportion of matches over the length of sequence from one 
sequence that is compared to some other sequence. Gaps (in either of the two 
sequences) are permitted to maximize matching; gap lengths of 15 bases or less are 
30 usually used, 6 bases or less are preferred with 2 bases or less more preferred. When 
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using oligonucleotides as probes or treatments, the sequence homology between the 
target nucleic acid and the oligonucleotide sequence is generally not less than 17 target 
base matches out of 20 possible oligonucleotide base pair matches (85%); preferably 
not less than 9 matches out of 10 possible base pair matches (90%), and more 

5 preferably not less than 19 matches out of 20 possible base pair matches (95%). 

Two amino acid sequences are homologous if there is a partial or complete 
identity between their sequences. For example, 85% homology means that 85% of 
the amino acids are identical when the two sequences are ahgned for maximxmi 
matching. Gaps (in either of the two sequences being matched) are allowed in 

10 maximizing matching; gap lengths of 5 or less are preferred with 2 or less being 

more preferred. Alternatively and preferably, two protein sequences (or polypeptide 
sequences derived from them of at least 100 amino acids in length) are homologous, 
as this term is used herein, if they have an ahgnment score of at more than 5 (in 
standard deviation units) using the program ALIGN with the mutation data matrix 

15 and a gap penalty of 6 or greater. See Dayhoff, M. O., in Atlas of Protein Sequence 
and Structure, 1972, volume 5, National Biomedical Research Foundation, 
pp. 101-1 10, and Supplement 2 to this volume, pp. 1-10. The two sequences or 
parts thereof are more preferably homologous if their amino acids are greater than or 
equal to 85% identical when optimally ahgned using the ALIGN program. 

20 The following terms are used to describe the sequence relationships between 

two or more polynucleotides: "reference sequence", "comparison window", 
"sequence identity", "percentage of sequence identity", and "substantial identity", 
A "reference sequence" is a defined sequence used as a basis for a sequence 
comparison; a reference sequence may be a subset of a larger sequence, for example, 

25 as a segment of a full-length cDNA or gene sequence given in a sequence Hsting, or 
may comprise a complete cDNA or gene sequence. Generally, a reference sequence 
is at least 20 nucleotides in length, frequently at least 25 nucleotides in length, and 
often at least 50 nucleotides in length. Since two polynucleotides may each (1) 
comprise a sequence (i.e., a portion of the complete polynucleotide sequence) that is 

30 similar between the two polynucleotides, and (2) may ftirther comprise a sequence 
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that is divergent between the two polynucleotides, sequence comparisons between 
two (or more) polynucleotides are typically performed by comparing sequences of 
the two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. 

5 A "comparison window", as used herein, refers to a conceptual segment of at 

least 20 contiguous nucleotides and wherein the portion of the polynucleotide 
sequence in the comparison window may comprise additions or deletions (i.e., gaps) 
of 20 percent or less as compared to the reference sequence (which does not 
comprise additions or deletions) for optimal alignment of the two sequences. 

10 Methods of alignment of sequences for comparison are well known in the 

art. Thus, the determination of percent identity between any two sequences can be 
accomphshed using a mathematical algorithm. Preferred, non-limiting examples of 
such mathematical algorithms are the algorithm of Myers and Miller (1988); the 
local homology algorithm of Smith and Waterman (1981); the homology alignment 

1 5 algorithm of Needleman and Wunsch (1 970); the search-for-similarity-method of 
Pearson and Lipman (1988); the algorithm of Karlin and Altschul (1990), modified 
as in Karlin and Altschul (1993). 

Computer implementations of these mathematical algorithms can be utilized 
for comparison of sequences to determine sequence identity. Such implementations 

20 include, but are not limited to: CLUSTAL in the PC/Gene program (available from 
IntelUgenetics, Mountain View, Cahfomia); the ALIGN program (Version 2.0) and 
GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 
Science Drive, Madison, Wisconsin, USA). Alignments using these programs can 

25 be performed using the default parameters. The CLUSTAL program is well 
described by Higgins et al. (1988); Higgins et al. (1989); Corpet et al. (1988); 
Huang et al, (1992); and Pearson et al. (1994). The ALIGN program is based on the 
algorithm of Myers and Miller, supra. The BLAST programs of Altschul et al. 
(1990), are based on the algorithm of Karlin and Altschul supra. To obtain gapped 

30 ahgnments for comparison purposes. Gapped BLAST (in BLAST 2.0) can be 
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utilized as described in Altschul et al. (1997). Alternatively, PSI-BLAST (in 
BLAST 2.0) can be used to perform an iterated search that detects distant 
relationships between molecules. See Altschul et al, 5wpra. When utilizing 
BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective 
5 programs (e.g. BLASTN for nucleotide sequences, BLASTX for proteins) can be 
used. See http://www.ncbi.nlm.nih.gov. Alignment may also be performed 
manually by inspection 

The term "sequence identity" means that two polynucleotide sequences are 
identical (i.e., on a nucleotide-by-nucleotide basis) over the window of comparison. 

1 0 The term "percentage of sequence identity" means that two polynucleotide 

sequences are identical (i.e., on a nucleotide-by-nucleotide basis) for the stated 
proportion of nucleotides over the window of comparison. The term "percentage of 
sequence identity" is calculated by comparing two optimally aligned sequences over 
the window of comparison, determining the number of positions at which the 

15 identical nucleic acid base (e.g., A, T, C, G, U, or I) occurs in both sequences to 

yield the number of matched positions, dividing the number of matched positions by 
the total number of positions in the window of comparison (i.e., the window size), 
and multiplying the result by 100 to yield the percentage of sequence identity. The 
terms "substantial identity" as used herein denote a characteristic of a 

20 polynucleotide sequence, wherein the polynucleotide comprises a sequence that has 
at least 60%, preferably at least 65%, more preferably at least 70%>, up to about 
85%), and even more preferably at least 90 to 95%, more usually at least 99%), 
sequence identity as compared to a reference sequence over a comparison window 
of at least 20 nucleotide positions, frequently over a window of at least 20-50 

25 nucleotides, and preferably at least 300 nucleotides, wherein the percentage of 
sequence identity is calculated by comparing the reference sequence to the 
polynucleotide sequence which may include deletions or additions which total 20 
percent or less of the reference sequence over the window of comparison. The 
reference sequence may be a subset of a larger sequence, 
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As applied to polypeptides, the term "substantial identity" means that two 
peptide sequences, when optimally aligned, such as by the programs GAP or 
BESTFIT using default gap weights, share at least about 85% sequence identity, 
preferably at least about 90% sequence identity, more preferably at least about 95 % 
5 sequence identity, and most preferably at least about 99 % sequence identity. 

The Synthetic Nucleic Acid Molecules and Methods of the Invention 

The invention provides compositions comprising synthetic nucleic acid 
molecules, as well as methods for preparing those molecules which yield synthetic 

10 nucleic acid molecules that are efficiently expressed as a polypeptide or protein with 
desirable characteristics including reduced inappropriate or unintended transcription 
characteristics when expressed in a particular cell type. 

Natural selection is the hypothesis that genotype-environment interactions 
occurring at the phenotypic level lead to differential reproductive success of 

1 5 individuals and hence to modification of the gene pool of a population. It is 

generally accepted that the amino acid sequence of a protein found in nature has 
undergone optimization by natural selection. However, amino acids exist within the 
sequence of a protein that do not contribute significantly to the activity of the 
protein and these amino acids can be changed to other amino acids with little or no 

20 consequence. Furthermore, a protein may be useful outside its natural environment 
or for purposes that differ from the conditions of its natural selection. In these 
circumstances, the amino acid sequence can be synthetically altered to better adapt 
the protein for its utility in various applications. 

Likewise, the nucleic acid sequence that encodes a protein is also optimized 

25 by natural selection. The relationship between coding DNA and its transcribed 

RNA is such that any change to the DNA affects the resulting RNA. Thus, natural 
selection works on both molecules simultaneously. However, this relationship does 
not exist between nucleic acids and proteins. Because multiple codons encode the 
same amino acid, many different nucleotide sequences can encode an identical 
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protein. A specific protein composed of 500 amino acids can theoretically be 
encoded by more than 10^^^ different nucleic acid sequences. 

Natural selection acts on nucleic acids to achieve proper encoding of the 
corresponding protein. Presumably, other properties of nucleic acid molecules are 

5 also acted upon by natural selection. These properties include codon usage 
frequency, RNA secondary structure, the efficiency of intron spHcing, and 
interactions with transcription factors or other nucleic acid binding proteins. These 
other properties may alter the efficiency of protein translation and the resulting 
phenotype. Because of the redundant nature of the genetic code, these other 

10 attributes can be optimized by natural selection without altering the corresponding 
amino acid sequence. 

Under some conditions, it is usefiil to synthetically alter the natural 
nucleotide sequence encoding a protein to better adapt the protein for alternative 
apphcations. A common example is to alter the codon usage frequency of a gene 

1 5 when it is expressed in a foreign host. Although redundancy in the genetic code 
allows amino acids to be encoded by multiple codons, different organisms favor 
some codons over others. The codon usage frequencies tend to differ most for 
organisms with widely separated evolutionary histories. It has been found that when 
transferring genes between evolutionarily distant organisms, the efficiency of 

20 protein translation can be substantially increased by adjusting the codon usage 
frequency (see U.S. Patent Nos. 5,096,825, 5,670,356 and 5,874,304). 

Because of the need for evolutionary distance, the codon usage of reporter 
genes often does not correspond to the optimal codon usage of the experimental 
cells. Examples include P-galactosidase (^-gal) and chloramphenicol 

25 acetyltransferase (cat) reporter genes that are derived from E. coli and are 

commonly used in mammalian cells; the p-glucuronidase {gus) reporter gene that is 
derived from E. coli and commonly used in plant cells; the firefly luciferase {luc) 
reporter gene that is derived from an insect and commonly used in plant and 
mammalian cells; and the Renilla luciferase, and green fluorescent protein {gfp) 

30 reporter genes which are derived from coelenterates and are commonly used in plant 
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and mammalian cells. To achieve sensitive quantitation of reporter gene expression, 
the activity of the gene product must not be endogenous to the experimental host 
cells. Thus, reporter genes are usually selected from organisms having unique and 
distinctive phenotypes. Consequently, these organisms often have widely separated 
5 evolutionary histories from the experimental host cells. 

Previously, to create genes having a more optimal codon usage frequency 
but still encoding the same gene product, a synthetic nucleic acid sequence was 
made by replacing existing codons with codons that were generally more favorable 
to the experimental host cell (see U.S. Patent Nos. 5,096,825, 5,670,356 and 
10 5,874,304.) The resuh was a net improvement in codon usage frequency of the 
synthetic gene. However, the optimization of other attributes was not considered 
and so these synthetic genes likely did not reflect genes optimized by natural 
selection. 

In particular, improvements in codon usage frequency are intended only for 
15 optimization of a RNA sequence based on its role in translation into a protein. 

Thus, previously described methods did not address how the sequence of a synthetic 
gene affects the role of DNA in transcription into RNA. Most notably, 
consideration had not been given as to how transcription factors may interact with 
the synthetic DNA and consequently modulate or otherwise influence gene 
20 transcription. For genes found in nature, the DNA would be optimally transcribed 
by the native host cell and would yield an RNA that encodes a properly folded gene 
product. In contrast, synthetic genes have previously not been optimized for 
transcriptional characteristics. Rather, this property has been ignored or left to 
chance. 

25 This concem is important for all genes, but particularly important for 

reporter genes, which are most commonly used to quantitate transcriptional behavior 
in the experimental host cells. Hundreds of transcription factors have been 
identified in different cell types under different physiological conditions, and likely 
more exist but have not yet been identified. All of these transcription factors can 

30 influence the transcription of an introduced gene, A usefiil synthetic reporter gene 
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of the invention has a minimal risk of influencing or perturbing intrinsic 
transcriptional characteristics of the host cell because the structure of that gene has 
been altered. A particularly useful synthetic reporter gene will have desirable 
characteristics under a new set and/or a wide variety of experimental conditions. To 

5 best achieve these characteristics, the structure of the synthetic gene should have 
minimal potential for interacting with transcription factors within a broad range of 
host cells and physiological conditions. Minimizing potential interactions between a 
reporter gene and a host cell's endogenous transcription factors increases the value 
of a reporter gene by reducing the risk of inappropriate transcriptional characteristics 

10 of the gene within a particular experiment, increasing applicability of the gene in 
various environments, and increasing the acceptance of the resulting experimental 
data. 

In contrast, a reporter gene comprising a native nucleotide sequence, based 
on a genomic or cDNA clone from the original host organism, may interact with 

1 5 transcription factors when expressed in an exogenous host. This risk stems from 
two circumstances. First, the native nucleotide sequence contains sequences that 
were optimized through natural selection to influence gene transcription within the 
native host organism. However, these sequences might also influence transcription 
when the gene is expressed in exogenous hosts, i.e., out of context, thus interfering 

20 with its performance as a reporter gene. Second, the nucleotide sequence may 
inadvertently interact with transcription factors that were not present in the native 
host organism, and thus did not participate in its natural selection. The probabihty 
of such inadvertent interactions increases with greater evolutionary separation 
between the experimental cells and the native organism of the reporter gene. 

25 These potential interactions with transcription factors would likely be 

disrupted when using a synthetic reporter gene having alterations in codon usage 
frequency. However, a synthetic reporter gene sequence, designed by choosing 
codons based only on codon usage frequency, is likely to contain other unintended 
transcription factor binding sites since the synthetic gene has not been subjected to 

30 the benefit of natural selection to correct inappropriate transcriptional activities. 
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Inadvertent interactions with transcription factors could also occur whenever the 
encoded amino acid sequence is artificially altered, e.g., to introduce amino acid 
substitutions. Similarly, these changes have not been subjected to natural selection, 
and thus may exhibit undesired characteristics. 

5 Thus, the invention provides a method for preparing synthetic nucleic acid 

sequences that reduce the risk of undesirable interactions of the nucleic acid with 
transcription factors when expressed in a particular host cell, thereby reducing 
inappropriate or unintended transcriptional characteristics. Preferably, the method 
yields synthetic genes containing improved codon usage frequencies for a particular 

10 host cell and with a reduced occurrence of transcription factor binding sites. The 
invention also provides a method of preparing synthetic genes containing improved 
codon usage frequencies with a reduced occurrence of transcription factor binding 
sites and additional beneficial structural attributes. Such additional attributes 
include the absence of inappropriate RNA splicing junctions, poly(A) addition 

15 signals, undesirable restriction sites, ribosomal binding sites, and secondary 
structural motifs such as hairpin loops. 

Also provided is a method for preparing two synthetic genes encoding the 
same or highly similar proteins ("codon distinct" versions). Preferably, the two 
synthetic genes have a reduced ability to hybridize to a common polynucleotide 

20 probe sequence, or have a reduced risk of recombining when present together in 
living cells. To detect recombination, PGR ampUfication of the reporter sequences 
using primers complementary to flanking sequences and sequencing of the amplified 
sequences may be employed. 

To select codons for the synthetic nucleic acid molecules of the invention, 

25 preferred codons have a relatively high codon usage frequency in a selected host 
cell, and their introduction results in the introduction of relatively few transcription 
factor binding sites, relatively few other undesirable structural attributes, and 
optionally a characteristic that distinguishes the synthetic gene from another gene 
encoding a highly similar protein. Thus, the synthetic nucleic acid product obtained 

30 by the method of the invention is a synthetic gene with improved level of expression 
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due to improved codon usage frequency, a reduced risk of inappropriate 
transcriptional behavior due to a reduced number of undesirable transcription 
regulatory sequences, and optionally any additional characteristic due to other 
criteria that may be employed to select the synthetic sequence. 

5 The invention may be employed with any nucleic acid sequence, e.g., a 

native sequence such as a cDNA or one which has been manipulated in vitro, e.g., to 
introduce specific alterations such as the introduction or removal of a restriction 
enzyme recognition site, the alteration of a codon to encode a different amino acid 
or to encode a fusion protein, or to alter GC or AT content (% of composition) of 

10 nucleic acid molecules. Moreover, the method of the invention is useful with any 
gene, but particularly useful for reporter genes as well as other genes associated with 
the expression of reporter genes, such as selectable markers. Preferred genes 
include, but are not limited to, those encoding lactamase (p-gal), neomycin 
resistance (Neo), CAT, GUS, galactopyranoside, GFP, xylosidase, thymidine 

1 5 kinase, arabinosidase and the hke. As used herein, a "marker gene" or "reporter 
gene" is a gene that imparts a distinct phenotype to cells expressing the gene and 
thus permits cells having the gene to be distinguished from cells that do not have the 
gene. Such genes may encode either a selectable or screenable marker, depending 
on whether the marker confers a trait which one can 'select' for by chemical means, 

20 i.e., through the use of a selective agent (e.g., a herbicide, antibiotic, or the like), or 
whether it is simply a "reporter" trait that one can identify through observation or 
testing, i.e., by ^screening'. Elements of the present disclosure are exemplified in 
detail through the use of particular marker genes. Of course, many examples of 
suitable marker genes or reporter genes are known to the art and can be employed in 

25 the practice of the invention. Therefore, it will be understood that the following 

discussion is exemplary rather than exhaustive. In hght of the techniques disclosed 
herein and the general recombinant techniques which are known in the art, the 
present invention renders possible the alteration of any gene. 

Exemplary marker genes include, but are not limited to, a neo gene, a p-gal 

30 gene, a gus gene, a cat gene, a gpt gene, a hyg gene, a hisD gene, a ble gene, a mprt 
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gene, a bar gene, a nitrilase gene, a mutant acetolactate synthase gene (ALS) or 
acetoacid synthase gene (AAS), a methotrexate-resistant dhfr gene, a dalapon 
dehalogenase gene, a mutated anthranilate synthase gene that confers resistance to 
5-methyl tryptophan (WO 97/26366), an R-locus gene, a p-lactamase gene, axy/E 

5 gene, an a-amylase gene, a tyrosinase gene, a luciferase {luc) gene, (e.g., a Renilla 
reniformis luciferase gene, a firefly luciferase gene, or a click beetle luciferase 
(Pyrophorus plagiophthalamus) gene), an aequorin gene, or a green fluorescent 
protein gene. Included within the terms selectable or screenable marker genes are 
also genes which encode a "secretable marker" whose secretion can be detected as a 

10 means of identifying or selecting for transformed cells. Examples include markers 
which encode a secretable antigen that can be identified by antibody interaction, or 
even secretable enzymes which can be detected by their catalytic activity. 
Secretable proteins fall into a number of classes, including small, diffusible proteins 
detectable, e.g., by ELISA, and proteins that are inserted or trapped in the cell 

15 membrane. 

The method of the invention can be performed by, although it is not hmited 
to, a recursive process. The process includes assigning preferred codons to each 
amino acid in a target molecule, e.g., a native nucleotide sequence, based on codon 
usage in a particular species, identifying potential transcription regulatory sequences 

20 such as transcription factor binding sites in the nucleic acid sequence having 

preferred codons, e.g., using a database of such binding sites, optionally identifying 
other undesirable sequences, and substituting an alternative codon (i.e., encoding the 
same amino acid) at positions where xmdesirable transcription factor binding sites or 
other sequences occur. For codon distinct versions, alternative preferred codons are 

25 substituted in each version. If necessary, the identification and elimination of 

potential transcription factor or other undesirable sequences can be repeated until a 
nucleotide sequence is achieved containing a maximum number of preferred codons 
and a minimum number of undesired sequences including transcription regulatory 
sequences or other undesirable sequences. Also, optionally, desired sequences, e.g., 

30 restriction enzyme recognition sites, can be introduced. After a synthetic nucleic 
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acid molecule is designed and constracted, its properties relative to the parent 
nucleic acid sequence can be determined by methods well known to the art. For 
example, the expression of the synthetic and target nucleic acid molecules in a series 
of vectors in a particular cell can be compared. 

5 Thus, generally, the method of the invention comprises identifying a target 

nucleic acid sequence, such as a vector backbone, a reporter gene or a selectable 
marker gene, and a host cell of interest, for example, a plant (dicot or monocot), 
fungus, yeast or mammalian cell. Preferred host cells are mammalian host cells 
such as CHO, COS, 293, Hela, CV-1 andNIH3T3 cells. Based on preferred codon 

10 usage in the host cell(s) and, optionally, low codon usage in the host cell(s), e.g., 
high usage mammahan codons and low usage E, coli and mammahan codons, 
codons to be replaced are determined. For codon distinct versions of two synthetic 
nucleic acid molecules, alternative preferred codons are introduced to each version. 
Thus, for amino acids having more than two codons, one preferred codon is 

15 introduced to one version and another preferred codon is introduced to the other 

version. For amino acids having six codons, the two codons with the largest number 
of mismatched bases are identified and one is introduced to one version and the 
other codon is introduced to the other version. Concurrent, subsequent or prior to 
selecting codons to be replaced, desired and undesired sequences, such as undesired 

20 transcriptional regulatory sequences, in the target sequence are identified. These 
sequences can be identified using databases and software such as EPD, NNPD, 
REBASE, TRANSFAC, TESS, GenePro, MAR fwww.ncer.org/MAR-search) and 
BCM Gene Finder, further described herein. After the sequences are identified, the 
modification(s) are introduced. Once a desired synthetic nucleic acid sequence is 

25 obtained, it can be prepared by methods well known to the art (such as PCR with 
overlapping primers), and its structural and flmctional properties compared to the 
target nucleic acid sequence, including, but not limited to, percent homology, 
presence or absence of certain sequences, for example, restriction sites, percent of 
codons changed (such as an increased or decreased usage of certain codons) and 

30 expression rates. 
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As described below, the method was used to create synthetic reporter genes 
encoding Renilla reniformis luciferase, and two chck beetle luciferases (one 
emitting green light and the other emitting red light). For both systems, the 
synthetic genes support much greater levels of expression than the corresponding 

5 native or parent genes for the protein. In addition, the native and parent genes 

demonstrated anomalous transcription characteristics when expressed in mammahan 
cells, which were not evident in the synthetic genes. In particular, basal expression 
of the native or parent genes is relatively high. Furthermore, the expression is 
induced to very high levels by an enhancer sequence in the absence of known 

10 promoters. The synthetic genes show lower basal expression and do not show the 
anomalous enhancer behavior. Presumably, the enhancer is activating 
transcriptional elements found in the native genes that are absent in the synthetic 
genes. The results clearly show that the synthetic nucleic acid sequences exhibit 
superior performance as reporter genes. 

15 

Exemplary Uses of the Molecules of the Invention 

The synthetic genes of the invention preferably encode the same proteins as 
their native counterpart (or nearly so), but have improved codon usage while being 
largely devoid of known transcription regulatory elements in the coding region. (It 

20 is recognized that a small number of amino acid changes may be desired to enhance 
a property of the native counterpart protein, e.g. to enhance luminescence of a 
luciferase.) This increases the level of expression of the protein the synthetic gene 
encodes and reduces the risk of anomalous expression of the protein. For example, 
studies of many important events of gene regulation, which may be mediated by 

25 weak promoters, are Hmited by insufficient reporter signals from inadequate 

expression of the reporter proteins. The synthetic luciferase genes described herein 
permit detection of weak promoter activity because of the large increase in level of 
expression, which enables increased detection sensitivity. Also, the use of some 
selectable markers may be limited by the expression of that marker in an exogenous 

30 cell. Thus, synthetic selectable marker genes which have improved codon usage for 
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that cellj and have a decrease in other undesirable sequences, (e.g., transcription 
factor binding sites), can permit the use of those markers in cells that otherwise were 
undesirable as hosts for those markers. 

Promoter crosstalk is another concern when a co-reporter gene is used to 

5 normalize transfection efficiencies. With the enhanced expression of synthetic 
genes, the amount of DNA containing strong promoters can be reduced, or DNA 
containing weaker promoters can be employed, to drive the expression of the co- 
reporter. In addition, there may be a reduction in the background expression from 
the synthetic reporter genes of the invention. This characteristic makes synthetic 

10 reporter genes more desirable by minimizing the sporadic expression from the genes 
and reducing the interference resulting from other regulatory pathways. 

The use of reporter genes in imaging systems, which can be used for in vivo 
biological studies or drug screening, is another use for the synthetic genes of the 
invention. Due to their increased level of expression, the protein encoded by a 

15 synthetic gene is more readily detectable by an imaging system. In fact, using a 
synthetic Renilla luciferase gene, luminescence in transfected CHO cells was 
detected visually without the aid of instrumentation. 

In addition, the synthetic genes may be used to express fusion proteins, for 
example fiisions with secretion leader sequences or cellular localization sequences, 

20 to study transcription in difficult-to-transfect cells such as primary cells, and/or to 
improve the analysis of regulatory pathways and genetic elements. Other uses 
include, but are not limited to, the detection of rare events that require extreme 
sensitivity (e.g., studying RNA recoding), use with IRES, to improve the efficiency 
of in vitro translation or in vitro transcription-translation coupled systems such as 

25 TNT (Promega Corp., Madison, WI), study of reporters optimized to different host 
organisms (e.g., plants, fimgus, and the like), use of multiple genes as co-reporters 
to monitor drug toxicity, as reporter molecules in multiwell assays, and as reporter 
molecules in drug screening with the advantage of minimizing possible interference 
of reporter signal by different signal transduction pathways and other regulatory 

30 mechanisms. 
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Additionally, uses for the nucleic acid molecules of the invention include 
fluorescence activated cell sorting (FACS), fluorescent microscopy, to detect and/or 
measure the level of gene expression in vitro and in vivo, (e.g., to determine 
promoter strength), subcellular localization or targeting (fusion protein), as a 

5 marker, in calibration, in a kit, (e.g., for dual assays), for in vivo imaging, to analyze 
regulatory pathways and genetic elements, and in multi-well formats. 

With respect to synthetic DNA encoding luciferases, the use of synthetic 
click beetle luciferases provides advantages such as the measurement of dual 
reporters. As Renilla luciferase is better suited for in vivo imaging (because it does 

10 not depend on ATP or Mg^^ for reaction, imUke firefly luciferase, and because 

coelenterazine is more permeable to the cell membrane than luciferin), the synthetic 
Renilla luciferase gene can be employed in vivo. Further, the synthetic Renilla 
luciferase has improved fidelity and sensitivity in dual luciferase assays, e.g., for 
biological analysis or in drug screening platform. 

15 

Demonstration of the Invention Using Luciferase Genes 

The reporter genes for click beetle luciferase and Renilla luciferase were 
used to demonstrate the invention because the reaction catalyzed by the protein they 
encode are significantly easier to quantify than the product of most genes. However, 
20 for the purposes of demonstrating the present invention they represent genes in 
general. 

Although the click beetle luciferase and Renilla luciferase genes share the 
name "luciferase", this should not be interpreted to mean that they originate from 
the same family of genes. The two luciferase proteins are evolutionarily distinct; 

25 they have fundamentally different traits and physical structures, they use vastly 

different substrates (Figure 17), and they evolved from completely different famihes 
of genes. The click beetle luciferase is 61 kD in size, uses luciferin as a substrate 
and evolved from the CoA synthetases. The Renilla luciferase originates from the 
sea pansy Renilla Reniformis, is 35 kD in size, uses coelenterazine as a substrate and 

30 evolved from the ap hydrolases. The only shared trait of these two enzymes is that 
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the reaction they catalyze results in light output. They are no more similar for 
resulting in light output than any other two enzymes would be, for example, simply 
because the reaction they catalyze results in heat. 

Bioluminescence is the light produced in certain organisms as a result of 
5 luciferase-mediated oxidation reactions. The luciferase genes, e.g., the genes from 
luminous beetles, sea pansy, and, in particular, the luciferase from Photinus pyralis 
(the common firefly of North America), are currently the most popular luminescent 
reporter genes. Reference is made to Bronstein et al (1994) for a review of 
luminescent reporter gene assays and to Wood (1995) for a review of the evolution 
10 of beetle bioluminescence. See Figure 17 for an illustration of the reactions 

catalyzed by each of firefly and chck beetle luciferases (17A) and Renilla luciferase 
(17B). 

Firefly luciferase and Renilla luciferase are highly valuable as genetic 
reporters due to the convenience, sensitivity and linear range of the luminescence 

15 assay. Today, luciferase is used in virtually every type of experimental biological 
system, including, but not limited to, prokaryotic and eukaryotic cell culture, 
transgenic plants and animals, and cell-free expression systems. The firefly 
luciferase enzyme is derived from a specific North American beetle, Photinus 
pyralis. The firefly luciferase enzyme and the click beetle luciferase enzyme are 

20 monomeric proteins (61 kDa) which generate light through monooxygenation of 

beetle luciferin utilizing ATP and O2 (Figure 17A). The Renilla luciferase is derived 
from the sea pansy Renilla reniformis. The Renilla luciferase enzyme is a 36 kDa 
monomeric protein that utilizes O2 and coelenterazine to generate light (Figure 17B). 
The gene encoding firefly luciferase was cloned from Photinus pyralis, and 

25 demonstrated to produce active enzyme in E. coli (de Wet et al, 1987). The cDNA 
encoding firefly luciferase {luc) continues to gain favor as the gene of choice for 
reporting genetic activity in animal, plant and microbial cells. The firefly luciferase 
reaction, modified by the addition of CoA to produce persistent light emission, 
provides an extremely sensitive and rapid in vitro assay for quantifying firefly 

30 luciferase expression in small samples of transfected cells or tissues. 
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To use firefly luciferase or click beetle luciferase as a genetic reporter, 
extracts of cells expressing the luciferase are mixed with substrates (beetle luciferin, 
Mg^^ ATP, and O2), and luminescence is measured immediately. The assay is very 
rapid and sensitive, providing gene expression data with little effort. The 
conventional firefly luciferase assay has been further improved by including 
coenzyme A in the assay reagent to yield greater enzyme turnover and thus greater 
luminescence intensity (Promega Luciferase Assay Reagent, Cat.# E1500, Promega 
Corporation, Madison, Wis.). Using this reagent, luciferase activity can be readily 
measured in luminometers or scintillation counters. Firefly and click beetle 
luciferase activity can also be detected in Hving cells in culture by adding luciferin 
to the growth medium. This in situ luminescence reUes on the ability of beetle 
luciferin to diffuse through cellular and peroxisomal membranes and on the 
intracellular availability of ATP and O2 in the cytosol and peroxisome. 

Further, although reporter genes are widely used to measure transcription 
events, their utihty can be limited by the fidelity and efficiency of reporter 
expression. For example, in U.S. Patent No. 5,670,356, a firefly luciferase gene 
(referred to as luc+) was modified to improve the level of luciferase expression. 
While a higher level of expression was observed, it was not determined that higher 
expression had improved regulatory control. 

The invention will be further described by the following nonlimiting 
examples. 

Example 1 

Svnthetic CHck Beetle (RD and GR) Luciferase Nucleic Acid Molecules 
LucPplYG is a wild-type chck beetle luciferase that emits yellow-green 
luminescence (Wood, 1989). A mutant of LucPjr^lYG named YG#81-6G01 was 
envisioned. YG#81-6G01 lacks a peroxisome targeting signal, has a lower for 
luciferin and ATP, has increased signal stability and increased temperature stabihty 
when compared to the wild type (PCT/W09914336). YG #81-6G01 was mutated to 
emit green luminescence by changing Ala at position 224 to Val (A224V is a green- 
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shifting mutation), or to emit red luminescence by simultaneously introducing the 
amino acid substitutions A224H, S247H, N346I, and H348Q (red-shifting mutation 
set) (PCT/W095 18853) 

Using YG #81-6G01 as a parent gene, two synthetic gene sequences were 
5 designed. One codes for a luciferase emitting green luminescence (OR) and one for 
a luciferase emitting red luminescence (RD). Both genes were designed to 1) have 
optimized codon usage for expression in mammalian cells, 2) have a reduced 
number of transcriptional regulatory sites including mammalian transcription factor 
binding sites, splice sites, poly(A) addition sites and promoters, as well as 

10 prokaryotic {E. coli) regulatory sites, 3) be devoid of unwanted restriction sites, e.g., 
those which are likely to interfere with standard cloning procedures, and 4) have a 
low DNA sequence identity compared to each other in order to minimize genetic 
rearrangements when both are present inside the same cell. In addition, desired 
sequences, e.g., a Kozak sequence or restriction enzyme recognition sites, may be 

15 identified and introduced. 

Not all design criteria could be met equally well at the same time. The 
following priority was established for reduction of transcriptional regulatory sites: 
elimination of transcription factor (TF) binding sites received the highest priority, 
followed by ehmination of sphce sites and poly(A) addition sites, and finally 

20 prokaryotic regulatory sites. When removing regulatory sites, the strategy was to 
work from the lesser important to the most important to ensure that the most 
important changes were made last. Then the sequence was rechecked for the 
appearance of new lower priority sites and additional changes made as needed. 
Thus, the process for designing the synthetic GR and RD gene sequences, using 

25 computer programs described herein, involved 5 optionally iterative steps that are 
detailed below 

1 . Optimized codon usage and changed A224V to create GRverl, 

separately changed A224H, S247H, H348Q and N346I to create RDverl. 
These particular amino acid changes were maintained throughout all 
30 subsequent manipulations to the sequence. 
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2. Removed undesired restriction sites, prokaryotic regulatory sites, splice 
sites, poly(A) sites thereby creating GRver2 and RDver2 . 

3. Removed transcription factor binding sites (first pass) and removed any 
newly created undesired sites as listed in step 2 above thereby creating 

5 GRver3 and RDver3 , 

4. Removed transcription factor binding sites created by step 3 above 
(second pass) and removed any newly created undesired sites as listed in 
step 2 above thereby creating GRver4 and RDver4 . 

5. Removed transcription factor binding sites created by step 4 above (third 
10 Pass) and confirmed absence of sites listed in step 2 above thereby 

creating GRverS and RDverS . 

6. Constructed the actual genes by PGR using synthetic oligonucleotides 
corresponding to fragments of GRverS and RDverS designed sequences 
(Figures 6 and 10) thereby creating GR6 and RD7 . GR6, upon 

15 sequencing was found to have the serine residue at amino acid position 

49 mutated to an asparagine and the proline at amino acid position 230 
mutated to a serine (S49N, P230S). RD7, upon sequencing was found to 
have the histidine at amino acid position 36 mutated to a tyrosine 
(H36Y). These changes occurred during the PGR process. 

20 7. The mutations described in step 6 above (S49N, P230S for GR6 and 

H36Y for RD7) were reversed to create GRverS. 1 and RDverS. 1 . 
8. RDverS. 1 was further modified by changing the arginine codon at 

position 3S1 to a glycine codon (R3S1G) thereby creating RDverS .2 with 
improved spectral properties compared to RDverS. 1. 

2S 9. RDverS. 2 was further mutated to increase luminescence intensity thereby 

creating RD1S6-1H9 which encodes four additional amino acid changes 
(M2I, S349T, K488T, ES38V) and three silent single base changes (SEQ 
ID NO: 18). 



30 1 . Optimize codon usage and introduce mutations determining luminescence color 

4S 



The starting gene sequence for this design step was YG #81-6G01 (SEQ ID N0:2). 
a) Optimize codon usage: 

The strategy was to adapt the codon usage for optimal expression in human 
cells and at the same time to avoid E. coli low-usage codons. Based on these 
5 requirements, the best two codons for expression in human cells for all amino acids 
with more than two codons were selected (see Wada et al., 1990). In the selection of 
codon pairs for amino acids with six codons, the selection was biased towards pairs 
that have the largest number of mismatched bases to allow design of GR and RD 
genes with minimum sequence identity (codon distinction): 
10 Arg: CGC/CGT Leu: CTG/TTG Ser: TCT/AGC 

Thr: ACC/ACT Pro: CCA/CCT Ala: GCC/GCT 
Gly: GGC/GGT Val: GTC/GTG He: ATC/ATT 
Based on this selection of codons, two gene sequences encoding the YG#81-6G01 
luciferase protein sequence were computer generated. The two genes were designed 
15 to have minimum DNA sequence identity and at the same time closely similar 

codon usage. To achieve this, each codon in the two genes was replaced by a codon 
from the limited Ust described above in an alternating fashion (e.g., Arg(^) is CGC in 
gene 1 and CGT in gene 2, Arg^^^+i^ is CGT in gene 1 and CGC in gene 2). 

For subsequent steps in the design process it was anticipated that changes 
20 had to be made to this hmited optimal codon selection in order to meet other design 
criteria, however, the following low-usage codons in manmialian cells were not 
used unless needed to meet criteria of higher priority: 
Arg: CGA Leu: CTA Ser: TCG 
Pro: CCG Val: GTA He: ATA 
25 Also, the following low-usage codons in E, coli were avoided when reasonable (note 
that 3 of these match the low-usage Ust for mammahan cells): 
Arg: CGA/CGG/AGA/AGG 
Leu: CTA Pro: CCC He: ATA 

30 b) Introduce mutations determining luminescence color: 
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10 



15 



Into one of the two codon-optimized gene sequences was introduced the 
single green-shifting mutation and into the other were introduced the 4 red-shifting 
mutations as described above. 

The two output sequences from this first design step were named GRverl 
(version 1 GR) and RDverl (version 1 RD). Their DNA sequences are 63% 
identical (594 mismatches), while the proteins they encode differ only by the 4 
amino acids that determine luminescence color (see Figures 2 and 3 for an 
alignment of the DNA and protein sequences). 

Tables 1 and 2 show, as an example, the codon usage for valine and leucine 
in human genes, the parent gene YG#81-6G01, the codon-optimized synthetic genes 
GRverl and RDverl, as well as the final versions of the synthetic genes after 
completion of step 5 in the design process (GRverS and RDver5). For a complete 
summary of the codon changes, see Figures 4 and 5. 
Table 1 : Valine 



Codon 


Human 


Parent 


GRverl 


RDverl 


GTA 


4 


13 


0 


0 


GTC 


13 


4 


25 


24 


GTG 


24 


12 


25 


25 


GTT 


9 


20 


0 


0 



Table 2: Leucine 



Codon 


Human 


Parent 


GRverl 


RDverl 


CTA 


3 


5 


0 


0 


CTC 


12 


4 


0 


1 


CTG 


24 


4 


28 


27 


CTT 


6 


12 


0 


0 


TTA 


3 


17 


0 


0 


TTG 


6 


13 


27 


27 



GR vers 


RD ver5 


1 


1 


21 


26 


25 


17 


3 


5 




GR ver5 


RDver5 


0 


-0 


12 


11 


19 


18 


1 


1 


0 


0 


23 


25 



20 



2, Remove xxndesired restriction sites, prokarvotic regulatory sites, splice sites and 
poly(A) addition sites 

The starting gene sequences for this design step were GRverl and RDverl. 
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a) Remove undesired restriction sites: 

To check for the presence and location of undesired restriction sites, the 
sequences of both synthetic genes were compared against a database of restriction 
enzyme recognition sequences (REBASE ver.712, http://www,neb.com/rebase) 
5 using standard sequence analysis software (GenePro ver 6.10, Riverside Scientific 
Ent.). 

Specifically, the following restriction enzymes were classified as undesired: 

- BamR I, Xho I, Sfi I, Kpn I, Sac I, Mlu I, Nhe I, Sma I, Xho I, Bgl 11, Hind 
III, Nco I, Nar I, Xba I, Hpa I, Sal I, 

10 - other cloning sites commonly used: EcoK I , EcoR V, Cla I, 

- eight-base cutters (commonly used for complex constructs), 

- BstE II (to allow N-terminal fusions), 

- Xcm I (can generate A/T overhang used for T-vector cloning). 

To eUminate undesired restriction sites when found in a synthetic gene, one or more 
15 codons of the synthetic gene sequence were altered in accordance with the codon 
optimization guidelines described in la above. 

b) Remove prokaryotic (E. coli) regulatory sequences: 

To check for the presence and location of prokaryotic regulatory sequences, 
the sequences of both synthetic genes were searched for the presence of the 
20 following consensus sequences using standard sequence analysis software 
(GenePro): 

- TATAAT (- 1 0 Pribnow box of promoter) 

- AGGA or GGAG (ribosome binding site; only considered if paired with 
a methionine codon 12 or fewer bases downstream), 

25 To eliminate such regulatory sequences when found in a synthetic gene, one or more 
codons of the synthetic gene at sequence were altered in accordance with the codon 
optimization guidelines described in la above. 

c) Remove splice sites: 

To check for the presence and location of splice sites, the DNA strand 
30 corresponding to the primary RNA transcript of each synthetic gene was searched 
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for the presence of the following consensus sequences (see Watson et al., 1983) 
using standard sequence analysis software (GenePro): 

- splice donor site: AG | GTRAGT (exon | intron), the search was 
performed for AGGTRAG and the lower stringency GGTRAGT; 

5 - splice acceptor site: (Y)„NCAG | G (intron | exon), the search was 

performed with n = 1 , 
To eliminate splice sites found in a synthetic gene, one or more codons of the 
synthetic gene sequence were altered in accordance with the codon optimization 
guidelines described in la above. Sphce acceptor sites were generally difficult to 

10 eliminate in one gene without introducing them into the other gene because they 
tended to contain one of the two only Gin codons (GAG); they were removed by 
placing the Gin codon CAA in both genes at the expense of a slightly increased 
sequence identity between the two genes, 
d) Remove poly(A) addition sites: 

15 To check for the presence and location of poly(A) addition sites, the 

sequences of both synthetic genes were searched for the presence of the following 
consensus sequence using standard sequence analysis software (GenePro): 

- AATAAA. 

To eliminate each poly(A) addition site found in a synthetic gene, one or more 
20 codons of the synthetic gene sequence were altered in accordance with the codon 
optimization guidelines described in la above. The two output sequences from this 
second design step were named GRver2 and RDver2. Their DNA sequences are 
63% identical (590 mismatches) (Figs. 2 and 3). 

25 3. Remove transcription factor fTF) binding sites, then repeat steps 2 a-d 

The starting gene sequences for this design step were GRver2 and RDver2. 
To check for the presence, location and identity of potential TF binding sites, the 
sequences of both synthetic genes were used as query sequences to search a database 
of transcription factor binding sites (TRANSFAC v3.2). The TRANSFAC database 

30 (http://transfac.gbf de/TRANSFAC/index:htmD holds information on gene 
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regulatory DNA sequences (TF binding sites) and proteins (TFs) that bind to and act 
through them. The SITE table of TRANSFAC Release 3.2 contains 4,401 entries of 
individual (putative) TF binding sites (including TF binding sites in eukaryotic 
genes, in artificial sequences resulting fi-om mutagenesis studies and in vitro 
5 selection procedures based on random oligonucleotide mixtures or specific 
theoretical considerations, and consensus binding sequences (from Faisst and 
Meyer, 1992)). 

The software tool used to locate and display these TF binding sites in the 
synthetic gene sequences was TESS (Transcription Element Search Software, 
10 http://agave.humgen.upenn.edu/tess/index.html ). The filtered string-based search 
option was used with the following user-defined search parameters: 

- Factor Selection Attribute: Organism Classification 

- Search Pattem: Mammalia 

- Max. Allowable Mismatch %: 0 
15 - Min. element length: 5 

- Min. log-likelihood: 10 

This parameter selection specifies that only mammalian TF binding sites 
(approximately 1,400 of the 4,401 entries in the database) that are at least 5 bases 
long will be included in the search. It ftirther specifies that only TF binding sites 

20 that have a perfect match in the query sequence and a minimum log likelihood 
(LLH) score of 10 will be reported. The LLH scoring method assigns 2 to an 
unambiguous match, 1 to a partially ambiguous match (e.g., A or T match W) and 0 
to a match against 'N\ For example, a search with parameters specified above 
would result in a "hit" (positive result or match) for TATAA (SEQ ID NO:240) 

25 (LLH = 10), STRATG (SEQ ID NO:241) (LLH - 10), and MTTNCNNMA (SEQ 
ID NO:242) (LLH = 10) but not for TRATG (SEQ ID NO: 243) (LLH = 9) if these 
four TF binding sites were present in the query sequence. A lower stringency test 
was performed at the end of the design process to re-evaluate the search parameters. 
When TESS was tested with a mock query sequence containing known TF 

30 binding sites it was found that the program was unable to report matches to sites 
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ending with the 3' end of the query sequence. Thus, an extra nucleotide was added 
to the 3' end of all query sequences to eUminate this problem. 

The first search for TF binding sites using the parameters described above 
found about 100 transcription factor binding sites (hits) for each of the two synthetic 
5 genes (GRver2 and RDver2). All sites were eliminated by changing one or more 
codons of the synthetic gene sequences in accordance with the codon optimization 
guidelines described in la above. However^ it was expected that some these 
changes created new TF binding sites, other regulatory sites, and new restriction 
sites. Thus, steps 2 a-d were repeated as described, and 4 new restriction sites and 2 
10 new spHce sites were removed. The two output sequences from this third design 
step were named GRver3 and RDver3. Their DNA sequences are 66% identical 
(541 mismatches) (Figs. 2 and 3). 

4. Remove new transcription factor (TF) binding sites, then repeat steps 2 a-d 

15 The starting gene sequences for this design step were GRver3 and RDver3. 

This fourth step is an iteration of the process described in step 3. The search for 
newly introduced TF binding sites yielded about 50 hits for each of the two 
synthetic genes. All sites were eliminated by changing one or more codons of the 
synthetic gene sequences in general accordance with the codon optimization 

20 guidelines described in la above. However, more high to medium usage codons 
were used to allow elimination of all TF binding sites. The lowest priority was 
placed on maintaining low sequence identity between the OR and RD genes. Then 
steps 2 a-d were repeated as described. The two output sequences from this fourth 
design step were named GRver4 and RDver4. Their DNA sequences are 68% 

25 identical (506 mismatches) (Figs 2 and 3). 

5. Remove new transcription factor (T¥) binding sites, then repeat steps 2 a-d 

The starting gene sequences for this design step were GRver4 and RDver4. 
This fifth step is another iteration of the process described in step 3 above. The 
30 search for new TF binding sites introduced in step 4 yielded about 20 hits for each 
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of the two synthetic genes. All sites were eliminated by changing one or more 
codons of the synthetic gene sequences in general accordance with the codon 
optimization guidelines described in la above. However, more high to medium 
usage codons were used (these are all considered "preferred") to allow elimination 

5 of all TF binding sites. The lowest priority was placed on maintaining low sequence 
identity between the GR and RD genes. Then steps 2 a-d were repeated as 
described. Only one acceptor splice site could not be eliminated. As a final step the 
absence of all TF binding sites in both genes as specified in step 3 was confirmed. 
The two output sequences from this fifth and last design step were named GRverS 

10 and RDverS. Their DNA sequences are 69% identical (504 mismatches) (Figs. 2 
and 3). 

Additional evaluation of GRverS and RDverS 

a) Use lower stringency parameters for TESS: 

IS The search for TF binding sites was repeated as described in step 3 above, but with 
even less stringent user-defined parameters: 

- setting LLH to 9 instead of 1 0 did not result in new hits; 

- setting LLH to 0 through 8 (incL) resulted in hits for two additional sites, 
MAMAG (22 hits) and CTKTK (24 hits); 

20 - setting LLH to 8 and the minimum element length to 4, the search 

yielded (in addition to the two sites above) different 4-base sites for AP- 
1, NF-1, and c-Myb that are shortened versions of their longer respective 
consensus sites which were eUminated in steps 3-S above. 
It was not realistic to attempt complete elimination of these sites without 

2S introduction of new sites, so no further changes were made. 

b) Search different database: 

The Eukaryotic Promoter Database (release 45) contains information about reliably 
mapped transcription start sites (1253 sequences) of eukaryotic genes. This 
database was searched using BLASTN 1.4.1 1 with default parameters (optimized to 
30 find nearly identical sequences rapidly; see Altschul et al, 1990) at the National 
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Center for Biotechnology Information site r http://www.ncbi.nlm.nih.gov/cgi- 
bin/BLAST). To test this approach, a portion of pGL3 -Control vector sequence 
containing the S V40 promoter and enhancer was used as a query sequence, yielding 
the expected hits to SV40 sequences. No hits were found when using the two 
5 synthetic genes as query sequences. 

Summary of GRverS and RDverS synthetic gene properties 

Both genes, which at this stage were still only "virtual'* sequences in the 
computer, have a codon usage that strongly favors mammalian high-usage codons 
10 and minimizes mammaUan and E, coli low-usage codons. Figure 4 shows a 
summary of the codon usage of the parent gene and the various synthetic gene 
versions. 

Both genes are also completely devoid of eukaryotic TF binding sites 
consisting of more than four unambiguous bases, donor and acceptor splice sites 

15 (one exception: GRverS contains one splice acceptor site), poly(A) addition sites, 
specific prokaryotic {E. coli) regulatory sequences, and undesired restriction sites. 

The gene sequence identity between GRverS and RDverS is only 69% (504 
base mismatches) while their encoded proteins are 99% identical (4 amino acid 
mismatches), see Figures 2 and 3. Their identity with the parent sequence YG#81- 

20 6G1 is 74% (GRverS) and 73% (RDverS), see Figure 2. Their base composition is 
49.9% GC (GRverS) and 49.5% GC (RDverS), compared to 40.2% GC for the 
parent YG#81-6G01. 

Construction of synthetic genes 
25 The two synthetic genes were constructed by assembly from synthetic 

oligonucleotides in a thermocycler followed by PCR amplification of the full-length 
genes (similar to Stemmer et al. (1995) Gene. 164? PP- 49-53). Unintended 
mutations that interfered with the design goals of the synthetic genes were corrected. 

30 a) Design of synthetic oligonucleotides: 
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The synthetic oligonucleotides were mostly 40mers that collectively code for 
both complete strands of each designed gene (1,626 bp) plus flanking regions 
needed for cloning (1,950 bp total for each gene; Figure 6). The 5^ and 3' boundaries 
of all oligonucleotides specifying one strand were generally placed in a manner to 
5 give an average offset/overlap of 20 bases relative to the boundaries of the 
oligonucleotides specifying the opposite strand. 

The ends of the flanking regions of both genes matched the ends of the 
amplification primers (pRAMtailup: 5 ' - gtact gagac gac gccagcccaagcttaggcct gagt g 
SEQ ID NO:229, and pRAMtaildn: 5'-ggcatgagcgt gaactgactgaactagcggccgccgag 

10 SEQ ID NO:230) to allow cloning of the genes into our E. coli expression vector 
pRAM (W099/14336). 

A total of 183 oligonucleotides were designed (Figure 6): fifteen 
oligonucleotides that collectively encode the upstream and downstream flanking 
sequences (identical for both genes; SEQ ID NOs: 35-49) and 168 oligonucleotides 

15 (4 X 42) that encode both strands of the two genes (SEQ ID NOs: 50-217). 

All 183 oligonucleotides were run through the hairpin analysis of the 
OLIGO software (OLIGO 4.0 Primer Analysis Software © 1989-1991 by Wojciech 
Rychlik) to identify potentially detrimental intra-molecular loop formation. The 
guidelines for evaluating the analysis results were set according to recommendations 

20 of Dr. Sims (Sigma-Genosys Custom Gene Synthesis Department): oligos forming 
hairpins with AG < -10 have to be avoided, those forming hairpins with AG < -7 
involving the 3' end of the oligonucleotide should also be avoided, while those with 
an overall AG < -5 should not pose a problem for this application. The analysis 
identified 23 oligonucleotides able to form hairpins with a AG between -7. 1 and - 

25 4.9. Of these, 5 had blocked or nearly blocked 3' ends (0-3 fi-ee bases) and were re- 
designed by removing 1-4 bases at their 3' end and adding it to the adjacent 
oligonucleotide. 

The 40mer oligonucleotide covering the sequence complementary to the 
poly(A) tail had a very low complexity 3' end (13 consecutive T bases). An 
30 additional 40mer was designed with a high complexity 3' end but a consequently 
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reduced overlap with one of its complementary oligonucleotides (11 instead of 20 
bases) on the opposite strand. 

Even though the oligos were designed for use in a thermocycler-based 
assembly reaction, they could also be used in a ligation-based protocol for gene 
5 construction. In this approach, the oligonucleotides are annealed in a pairwise 
fashion and the resulting short double-stranded fragments are ligated using the 
sticky overhangs. However, this would require that all oligonucleotides be 
phosphorylated. 

10 b) Gene assembly and amplification 

In a first step, each of the two synthetic genes was assembled in a separate 
reaction from 98 oligonucleotides. The total volume for each reaction was 50 |il: 
0.5 |LiM oligonucleotides (= 0.25 pmoles of each ohgo) 
LOU Taq DNA polymerase 
1 5 0.02 U Pfu DNA polymerase 



In a second step, each assembled synthetic gene was amphfied in a separate 
reaction. The total volume for each reaction was 50 |j-l: 
2.5 1 assembly reaction 
5.0 U Taq DNA polymerase 



2 mM MgCl2 

0.2 mM dNTPs (each) 

0.1% gelatin 

Cycling conditions: (94°C for 30 seconds, 52°C for 30 
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seconds, and 72°C for 30 seconds) x 55 cycles. 
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0.1 U Pfu DNA polymerase 

1 M each primer (pRAMtailup, pRAMtaildn) 

2mMMgCl2 

0.2 mM dNTPs (each) 

Cycling conditions: (94°C for 20 seconds, 65°C for 60 
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seconds, 72^C for 3 minutes) x 30 cycles. 
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The assembled and amplified genes were subcloned into the pRAM vector 
and expressed in E. coli, yielding 1-2% luminescent GR or RD clones. Five GR and 
five RD clones were isolated and analyzed further. Of the five GR clones, three had 
the correct insert size, of which one was weakly luminescent and one had an altered 
5 restriction pattern. Of the five RD clones, two had the correct size insert with an 
altered restriction pattern and one of those was weakly luminescent. Overall, the 
analysis indicated the presence of a large number of mutations in the genes, most 
likely the result of errors introduced in the assembly and amplification reactions. 

10 c) Corrective assembly and amplification 

To remove the large number of mutations present in the full-length synthetic 
genes we performed an additional assembly and amplification reaction for each gene 
using the proof-reading DNA polymerase Tli, The assembly reaction contained, in 
addition to the 98 GR or RD oHgonucleotides, a small amount of DNA from the 
1 5 corresponding full-length clones with mutations described above. This allows the 
ohgos to correct mutations present in the templates. 

The following assembly reaction was performed for each of the synthetic 
genes. The total volume for each reaction was 50 |Lil: 



seconds, 52°C for 30 seconds, 72°C for 30 seconds) for 55 
cycles, then 72°C for 5 minutes. 
The following amphfication reaction was performed on each of the assembly 
reactions. The total volume for each amplification reaction was 50 
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0.5 |iM oHgonucleotides (= 0.25 pmoles of each ohgo) 

0.016 pmol plasmid (mix of clones with correct insert size) 

2.5 U Tli DNA polymerase 

2 mM MgCl^ 

0,2 mM dNTPs (each) 

0.1% gelatin 

Cycling conditions: 94°C for 30 seconds, then (94°C for 30 
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1-5 |Lil of assembly reaction 
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40 pmol each primer (pRAMtailup, pRAMtaildn) 
2.5 U Tli DNA polymerase 
2 mM MgCl2 
0.2 mM dNTPs (each) 
5 Cycling conditions: 94°C for 30 seconds, then (94°C for 20 

seconds, 65°C for 60 seconds and 72°C for 3 minutes) for 30 
cycles, then IT'C for 5 minutes. 
The genes obtained from the corrective assembly and amplification step 
were subcloned into the pRAM vector and expressed in E, coU, yielding 75% 
10 luminescent GR or RD clones. Forty-four GR and 44 RD clones were analyzed 
with our screening robot (W099/14336). The six best GR and RD clones were 
manually analyzed and one best GR and RD clone was selected (GR6 and RD7). 
Sequence analysis of GR6 revealed two point mutations in the coding region, both 
of which resulted in an amino acid substitution (S49N and P230S). Sequence 
15 analysis of RD7 revealed three point mutations in the coding region, one of which 
resulted in an amino acid substitution (H36Y). It was confirmed that none of the 
silent point mutations introduced any regulatory or restriction sites conflicting with 
the overall design criteria for the synthetic genes. 

20 d) Reversal of unintended amino acid substitutions 

The imintended amino acid substitutions present in the GR6 and RD7 
synthetic genes were reversed by site-directed mutagenesis to match the GRver5 and 
RDverS designed sequences, thereby creating GRverS.l and RDverS.l. The DNA 
sequences of the mutated regions were confirmed by sequence analysis. 

25 

e) Improve spectral properties 

The RDverS. 1 gene was further modified to improve its spectral properties 
by introducing an amino change (R351G), thereby creating RDver5.2 

30 pGL3 vectors with RD and GR genes 
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The parent click beetle luciferase YG#81-6G1 ("YG"), and the synthetic click 
beetle luciferase genes GRverS.l ("GR"), RDver5.2 ("RD"), and RD156-1H9 were 
cloned into the four pGL3 reporter vectors (Promega Corp.): 

- pGL3-Basic = no promoter, no enhancer 

5 - pGL3-Control = SV40 promoter, SV40 enhancer 

- pGL3-Enhancer = SV40 enhancer (3' to luciferase coding sequences) 

- pGL3-Promoter = SV40 promoter. 

The primers employed in the assembly of GR and RD synthetic genes facilitated the 
cloning of those genes into pRAM vectors. To introduce the genes ioto pGL3 

10 vectors (Promega Corp., Madison, WI) for analysis in mammahan cells, each gene 
inapRAM vector (pRAM RDverS.l, pRAM GRverS.l, andpRAM RD156-1H9) 
was amplified to introduce an Nco I site at the 5' end and smXba I site at the 3' end 
of the gene. The primers for pRAM RDverS.l and pRAM GRverS.l were: 
GR-^5' GGA TCC CAT GGT GAA GCG TGA GAA 3' (SEQ ID NO:231) or 

1 5 RD^S' GGA TCC CAT GGT GAA ACG CGA 3' (SEQ ID NO:232) and 

5' CTA GCT TTT TTT TCT AGA TAA TCA TGA AGA C 3' (SEQ ID NO:233) 
The primers for pRAM RD156-1H9 were: 

5' GCG TAG CCA TGG TAA AGC GTG AGA AAA ATG TC 3' (SEQ ID NO: 
29S) and 

20 S' CCG ACT CTA GAT TAC TAA CCG CCG GCC TTC ACC 3' (SEQ ID NO: 
296) 

The PCR included: 

100 ng DNA plasmid 
1 |xM primer upstream 
25 1 nM primer downstream 

0.2 mM dNTPs 
IX buffer (Promega Corp.) 
S units Pfu DNA polymerase (Promega Corp.) 
Sterile nanopure HjO to SO fj,l 
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The cycling parameters were: 94''C for 5 minutes; (94°C for 30 seconds; 
55°C for 1 minute; and 72°C for 3 minutes) x 15 cycles. The purified PCR product 
was digested with Nco I wadXba I, ligated with pGL3-control that was also digested 
with Nco I and Xba I, and the hgated products introduced to E. coli. To insert the 
5 luciferase genes into the other pGL3 reporter vectors (basic, promoter and 

enhancer), the pGL3 -control vectors containing each of the luciferase genes was 
digested with Nco I mdXba I, ligated with other pGL3 vectors that also were 
digested with Nco I ^nd Xba I, and the ligated products introduced to E. coli. Note 
that the polypeptide encoded by GRver5.1 and RDverS.l (and RD156-1H9, see 
10 below) nucleic acid sequences in pGL3 vectors has an amino acid substitution at 
position 2 to vahne as a result of the Nco I site at the initiation codon in the 
oligonucleotide. 

Because of intemal Nco I andXba I sites, the native gene in YG #81-6G01 
was amplified from a Hind III site upstream to a Hpa I site downstream of the 

15 coding region and which included flanking sequences foimd in the GR and RD 

clones. The upstream primer (5'-CAA AAA GCT TGG CAT TCC GGT ACT GTT 
GGT AAA GCC ACC ATG GTG AAG CGA GAG- 3'; SEQ ID NO:234) and a 
downstream primer (5'- CAA TTG TTG TTG TTA ACT TGT TTA TT -3'; SEQ ID 
NO:235) were mixed with YG#81-6G01 and ampUfied using the PCR conditions 

20 above. The purified PCR product was digested with Nco I and Xba I, ligated with 
pGL3-control that was also digested with Hind III and Hpa I, and the ligated 
products introduced into E, coli. To insert YG#81-6G01 into the other pGL3 
reporter vectors (basic, promoter and enhancer), the pGL3-control vectors 
containing YG#81-6G01 were digested with A^co I and^a I, ligated with the other 

25 pGL3 vectors that also were digested with Nco I and Xba I, and the ligated products 
introduced to E, coli. Note that the clone of YG#81-6G01 in the pGL3 vectors has a 
C instead of an A at base 786, which yields a change in the amino acid sequence at 
residue 262 jfrom Phe to Leu (Figure 2 shows the sequence of YG#81-6G01 prior to 
introduction into pGL3 vectors). To determine whether the altered amino acid at 

30 position 262 affected the enzyme biochemistry, the clone of YG#81-6G01 was 
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mutated to resemble the original sequence. Both clones were then tested for 
expression in E, coH, physical stability, substrate binding, and luminescence output 
kinetics. No significant differences were found. 

Partially purified enzymes expressed from the synthetic genes and the parent 
5 gene were employed to determine Km for luciferin and ATP (see Table 3). 

Table 3 



Enzyme 


(LH,) 


K^. (ATP) 


YG parent 




17 nM 


GR 


1.3 |iM 


25 ^lM 


RD 


24.5 ^lM 


46|aM 



In vitro eukaryotic transcription/translation reactions were also conducted 

10 using Promega's TNT T7 Quick system according to manufacturer's instructions. 
Luminescence levels were 1 to 37-fold and 1 to 77-fold higher (depending on the 
reaction time) for the synthetic GR and RD genes, respectively, compared to the 
parent gene (corrected for luminometer spectral sensitivity). 

To test whether the synthetic click beetle luciferase genes and the wild type 

1 5 click beetle gene have improved expression in mammalian cells, each of the 

synthetic genes and the parent gene was cloned into a series of pGL3 vectors and 
introduced into CHO cells (Table 8). In all cases, the synthetic click beetle genes 
exhibited a higher expression than the native gene. Specifically, expression of the 
synthetic GR and RD genes was 1900-fold and 40-fold higher, respectively, than 

20 that of the parent (transfection efficiency normalized by comparison to native 

Renilla luciferase gene). Moreover, the data (basic versus control vector) show that 
the synthetic genes have reduced basal level transcription. 

Further, in experiments with the enhancer vector where the percentage of 
activity in reference to the control is compared between the native and synthetic 

25 gene, the data showed that the synthetic genes have reduced risk of anomalous 

transcription characteristics. In particular, the parent gene appeared to contain one 
or more internal transcriptional regulatory sequences that are activated by the 
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enhancer in the vector, and thus is not suitable as a reporter gene while the synthetic 
GR and RD genes showed a clean reporter response (transfection efficiency 
normahzed by comparison to native Renilla luciferase gene)* See Table 9. 

The clone names and their corresponding SEQ ID numbers for nucleotide 
5 sequence and amino acid sequence are listed below in Table 4. 



Table 4 





Clone name 


Luciferase Type 


SEQ ID NO. 


SEQ ID NO 




LUCPPLYG 


Wild type YG Click Beetle 


1 


23 


1 n 
lU 




Mutant 1 Kj uncK rseetie 


z 


OA 




vjKveri 


Synthetic Green Click Beetle 








Lrivverz 


oyntneiic urreen L/Iick rjeeiie 


A 






LrKverj 


oynmexic Lrreen v^iicK oeeiie 


J 


97 




ijrKver4 


Synthetic Green Click Beetle 


/C 

D 


Zo 


1 < 


(jrKverj 


Synthetic Green Click Beetle 


1 


OQ 
Ly 






Synthetic Green Click Beetle 


Q 
O 






LrKverj.i 


Synthetic Green Click Beetle 


o 
y 


^ 1 
D 1 




KUverl 


byntnetic Ked CiiCK rJeetle 


1 n 
iU 






RDver2 


Synthetic Red Chck Beetle 


11 


33 


20 


RDver3 


Synthetic Red Click Beetle 


12 


34 




RDver4 


Synthetic Red CUck Beetle 


13 


218 




RDverS 


Synthetic Red Click Beetle 


14 


219 




RD7 


Synthetic Red Click Beetle 


15 


220 




RDverS. 1 


Synthetic Red Click Beetle 


16 


221 


25 


RDverS .2 


Synthetic Red Click Beetle 


17 


222 




RD156-1H9 


Synthetic Red Click Beetle 


18 


223 




RELLUC 


Wild type Renilla 


19 


224 




Rlucverl 


Synthetic Renilla 


20 


225 




Rlucver2 


Synthetic Renilla 


21 


226 
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Rluc-fmal Synthetic Renilla 



22 



227 



Example 2 

5 Evolution of the RD luciferase gene 

RDver5.2 was mutated to increase its luminescence intensity, thereby creating 
RD156-1H9 which carries four additional amino acid changes (M2I, S349T, K488T, 
E538V) and three silent point mutations (SEQ ID N0:18). 
a) Site-directed mutagenesis: 

10 The initial strategy was to use site-directed mutagenesis. There are four amino 
acid differences between the GR and RD synthetic genes with H348Q providing the 
greatest contribution to red color. Thus, this substitution may also cause structural 
changes in the protein that could lead to low Hght output. Optimization of positions 
near this area could increase hght output. The following positions were selected for 

15 mutagenesis: 

1 , S344 (at the edge of the binding pocket for luciferin) - randomize this 
codon. 

2. A245 (strictly conserved but closest to 348 and at the edge of the active site 
pocket) - randomize this codon. 

20 3. 1347 (not conserved, next to 348 in sequence) - mutate to hydrophobic 

amino acids only. 

4. S349 (not conserved, next to 348 in sequence) - mutate to S, T, A, P only. 
Oligonucleotides designed to mutate the above positions were used in a site- 
directed mutagenesis experiment (W099/14336) and the resulting mutants were 
25 screened for luminescence intensity. There was httle variation in light intensity and 
only about 25% were luminescent. For more detailed analysis, clones were picked 
and analyzed with the screening robot (PCT/W09914336). None of the clones had 
a luminescence intensity (LI) higher than RDver5.2, but four of the clones had 
slightly lower composite Km for luciferin and ATP (Km). 
30 b) Directed evolution: 
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Protocols and procedures used for the directed evolution are detailed in see 
PCTAV09914336. DNA from the four clones with lower Km was combined and 
three libraries of random mutants were produced. The libraries were screened with 
the robot and clones with the highest LI values were selected. These clones were 
5 shuffled together and another robotic screen was completed with an incubation 

temperature of 46°C. The three clones with the highest LI values were RD156-0B4, 
RD156-1A5, and RD156-1H9. 
c) Analysis: 

The three clones with the highest LI values were selected for manual analysis to 
10 confirm that their luminescence intensity was higher than that of RDver5.2 and to 
ensure that their spectral properties were not compromised. One of the clones was 
slightly green- shifted, all others maintained the spectral properties of RDver5.2 
(Table 5). 



Table 5 

Clone Peak(nm) Width (nm) 



RD156-0B4 


616 


68 


RD156-1A5 


614 


70 


RD156-1H9 


618 


69 


RDver5.2 (prep #1) 


617 


70 


RDver5.2 (prep #2) 


618 


69 



15 

The Km values for luciferin and the luminescence intensity relative to 
RDver5.2 were determined for all three clones in several independent experiments. 
All cells samples were processed with CCLR lysis buffer (E1483, Promega Corp., 
Madison, WI) and diluted 1:10 into buffer (25 mM HEPES pH 7.8, 5% glycerol, 1 
20 mg/ml BSA, 150 mM NaCl). Table 7 summarizes the results (Lum: luminescence 
values were normaUzed to optical density; measurements for independent 
experiments are separated by forward slashes) from expression in bacterial cells. 
RD156-1H9, the clone with the highest luminescence intensity (5 to 10-fold 
increase) also has an about 2-fold higher Km for luciferin. 

25 
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Table 6 



Clone Km Luciferin [^M] Lum (normalized to RDver5.2) 



RD156-0B4 


8/10 


2.2/2.5 


RD156-1A5 


13/13 


3.1/5.6 


RD156-1H9 


20 / 23 / 23 


4/10.9/7.5 


RDver5.2 (prep #1) 


12/14/14 




RDver5.2 (prep #2) 


40/50 




GRverS.l (prep #1) 


0.5 


64 


GRverS.l (prep #2) 


3 





Table 7 shows a comparison between the luminescence intensities of 
RD156-1H9, GRverS.l and RDver5.2 normalized to GRverS.l with and without 
5 correction for the spectral sensitivity of the luminometer photomultiplier tube. With 
correction, the luminescence intensity of clone RD156-1H9 was only about 2-fold 
lower than that of GRverS.l. The luciferin Km for clone RD1S6-1H9 is 
approximately 40-fold higher than GRverS.l. RD1S6-1H9 is thermostable at SO°C 
for at least 2 hours. 

10 

Table 7 



Name 


No Correction 


With Correction 


RDver5.2 


0.016 


0.06 


GRverS.l 


1.000 


1.00 


RD156-1H9 


0.116 


0.45 



IS Tables 8 and 9 show a comparison of luciferase expression levels in CHO 

cells. Table 8 shows the expression levels only from the control vectors in 
comparison to the firefly luciferase gene (RLU = relative light units). Table 9 
shows a comparison of the expression levels in all four pGL3 vectors calculated as a 
percent of the expression level in pGL3-control. 

20 
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Table 8 



Synthetic Click Beetle Gene Expression 


Control vector 


rlu 


YG#81-6G01 


177 


GRverS.l 


343,417 


RDverS.l 


7,161 


RD156-1H9 


20,802 


FireFIy 


488,016 


Table 9 


Synthetic Chck Beetle Gene ExDression 


Vector 


Percent of control 




vector 


YG-control 


100 


RD-control 


100 


GR-control 


100 


RD156-1H9 control 


100 


YG-basic 


3.3 


RD-basic 


1.0 


GR-basic 


0.2 


RD156-1H9 basic 


0.3 


YG-promoter 


4.2 


RD-promoter 


15.1 


GR-promoter 


5.7 


RD156-1H9 promoter 


15.5 


YG-enhancer 


51.5 


RD-enhancer 


2.8 


GR-enhancer 


1.4 


RD156-1H9 enhancer 


0.3 



Example 3 

Synthetic Renilla Luciferase Nucleic Acid Molecule 
The synthetic Renilla luciferase genes prepared include 1) an introduced 
Kozak sequence, 2) codon usage optimized for mammalian (human) expression, 3) a 
reduction or elimination of unwanted restriction sites, 4) removal of prokaryotic 
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regulatory sites (ribosome binding site and TATA box), 5) removal of splice sites 
and poly(A) addition sites, and 6) a reduction or elimination of mammalian 
transcriptional factor binding sequences. 

The process of computer-assisted design of synthetic Renilla luciferase 
5 genes by iterative rounds of codon optimization and removal of transcription factor 
binding sites and other regulatory sites as well as restriction sites can be described in 
three steps: 

1 . Using the wild type Renilla luciferase gene as the parent gene, codon usage was 
optimized, one amino acid was changed (T^A) to generate a Kozak consensus 

10 sequence, and undesired restriction sites were eliminated thereby creating 

synthetic gene Rlucverl . 

2. Remove prokaryotic regulatory sites, sphce sites, poly(A) sites and transcription 
factor (TF) binding sites (first pass). Then remove newly created TF binding 
sites. Then remove newly created undesired restriction enzyme sites, 

15 prokaryotic regulatory sites, sphce sites, and poly(A) sites without introducing 

new TF binding sites. This thereby created Rlucver2 , 

3. Change 3 bases of Rlucverl thereby creating Rluc-fmaL 

4. The actual gene was then constructed from synthetic oligonucleotides 
corresponding to the Rluc-final designed sequence. All mutations resulting from 

20 the assembly or PGR process were corrected. This gene is Rluc-final (SEQ ID 

NO:22) and encodes the amino acid sequence of SEQ ID NO:227, 

Codon Selection 

Starting with the Renilla reniformis luciferase sequence in Genbank 
25 (Accession No. M63501, SEQ ID NO: 19), codons were selected based on codon 
usage for optimal expression in human cells and to avoid E. coli low-usage codons. 
The best codon for expression in human cells (or the best two codons if found at a 
similar frequency) was chosen for all amino acids with more than one codon (Wada 
etal, 1990): 

30 Arg: CGC Lys: AAG 
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A cn • A A r* 




nin* PAG 
Vjlll. v^/\vjr 


1 ill . 


ffiq- TAP 


Pro- PPA/rTT 


\JL\X. \Jr\\J 


Ala: GCC 


Asp: GAC 


Gly: GGC 


Tyr: TAG 


Val: GTG 


Cys: TGC 


lie: ATC/ATT 


Phe: TTC 



In cases where two codons were selected for one amino acid, they were used 

10 in an alternating fashion. To meet other criteria for the synthetic gene, the initial 
optimal codon selection was modified to some extent later. For example, 
introduction of a Kozak sequence required the use of GCT for Ala at amino acid 
position 2 (see below). 

The following low-usage codons in mammalian cells were not used unless 

15 needed: Arg: CGA, CGU; Leu: CTA,UUA;Ser: TCG; Pro: CCG; Val: GTA; 
and He: ATA. The following low-usage codons in E. coli were also avoided when 
reasonable (note that 3 of these match the low-usage list for mammalian cells): Arg: 
CGA/CGG/AGA/AGG, Leu: CTA; Pro: CCC; He: ATA. 
Introduction of Kozak Sequences 

20 The Kozak sequence: 5' a accATGG CT 3' (SEQ ID NO: 293) (the Nco I site 

is underlined, the coding region is shown in capital letters) was introduced to the 
synthetic Renilla luciferase gene. The introduction of the Kozak sequence changes 
the second amino acid from Thr to Ala (GCT). 
Removal of undesired restriction sites 

25 REBASE ver. 808 (updated August 1, 1998; Restriction Enzyme Database; 

www.neb.com/rebase) was employed to identify undesirable restriction sites as 
described in Example 1. The following undesired restriction sites (in addition to 
those described in Example 1) were removed according to the process described in 
Example 1 : EcolC^ I, Ndel, Nsi\ Sphl, Spel, Xmal, Pstl. 
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The version of Renilla luciferase (Rluc) which incorporates all these changes 
is Rlucverl, 

Removal of prokarvotic (E. coli) regulatory sequences, splice sites, and poly(A) 
sites 

5 The priority and process for ehminating transcription regulation sites was as 

described in Example 1 . 

Removal of TF binding sites 

The same process, tools, and criteria were used as described in Example 1^ 
10 however, the newer version 3.3 of the TRANSFAC database was employed. 

After removing prokaryotic regulatory sequences, splice sites and poly(A) 
sites from Rlucverl, the first search for TF binding sites identified about 60 hits. All 
sites were eliminated with the exception of three that could not be removed without 
altering the amino acid sequence of the synthetic Renilla gene: 
15 1 . site at position 63 composed of two codons for W 

(TGGIGG), for CAC-binding protein T00076; 

2, site at position 522 composed of codons for KMV (AAN 
ATGGTN), for myc-DFl T00517; 

3. site at position 885 composed of codons for EMG (GAR 
20 ATGGGN), for myc-DFl T005 17, 

The subsequent second search for (newly introduced) TF binding sites yielded about 
20 hits. All new sites were eliminated, leaving only the three sites described above. 
Finally, any newly introduced restriction sites, prokaryotic regulatory sequences, 
splice sites and poly(A) sites were removed without introducing new TF binding 
25 sites if possible. 

Rlucver2 was obtained (SEQ ID Nos. 21 and 226). 

As in Example 1, lower stringency search parameters were specified for the 
TESS filtered string search to further evaluate the synthetic Renilla gene. 

With the LLH reduced from 10 to 9 and the minimum element length 
30 reduced from 5 to 4, the TESS filtered string search did not show any new hits. 
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When, in addition to the parameter changes hsted above, the organism classification 
was expanded from "mammalia" to "chordata", the search yielded only four more 
TF binding sites. When the Min LLH was further reduced to between 8 and 0, the 
search showed two additional 5 -base sites (MAM AG and CTKTK) which 
5 combined had four matches in Rlucver25 as well as several 4-base sites. Also as in 
Example 1, Rlucver2 was checked for hits to entries in the EPD (Eukaryotic 
Promoter Database, Release 45). Three hits were determined (one to Mus musculus 
promoter H-2LM ( Cell 44. 261 (1986), one to Herpes Simplex Virus type 1 
promoter b'g7.7 kb, and one to Homo sapiens DHFR promoter ( J. Mol. Biol. , 176 , 
10 169 (1984)). However, no further changes were made to Rlucver2. 

Summary of Properties for Rlucver2 

All 30 low usage codons were eliminated. The introduction of a Kozak 
sequence changed the second amino acid from Thr to Ala; 
15 - base composition: 55.7% GC (Renilla wild-type parent gene: 36.5%); 

one undesired restriction site could not be eliminated: EcoR V at position 
488; 

the synthetic gene had no prokaryotic promoter sequence but one potentially 
functional ribosome binding site (RBS) at positions 867-73 (about 13 bases 
20 upstream of a Met codon ) could not be eliminated; 

all poly(A) addition sites were eliminated; 

splice sites: 2 donor splice sites could not be eliminated (both share the 
amino acid sequence MGK); 

TF sites: all sites with a consensus of >4 unambiguous bases were 
25 eliminated (about 280 TF binding sites were removed) with 3 exceptions due 

to the preference to avoid changes to the amino acid sequence. 
Synthetic Renilla luciferase sequences are shown in Figures 7 and 8. A codon usage 
comparison is shown in Figure 9. 

When introduced into pGL3, Rluc-fmal has a Kozak sequence 
30 (CACCATGGCT). The changes in Rluc-fmal relative to Rlucver2 were introduced 
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during gene assembly. One change was at position 619, a C to an A, which 
eliminated a eukaryotic promoter sequence and reduced the stability of a hairpin 
structure in the corresponding oligonucleotide employed to assemble the gene. 
Other changes included a change from CGC to AGA at positions 218-220 (resulted 
5 in a better oligonucleotide for PGR). 

Gene Assembly Strategy 

The gene assembly protocol employed for the synthetic Renilla luciferase 
was similar to that described in Example 1 . The oligonucleotides employed are 
1 0 shown in Figure 1 0. 

Sense Strand primer: 

5' AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA 3' (SEQ ID 
NO:236) 
1 5 Anti-sense Strand primer: 

5' GCTCTAGAATTACTGCTCGTTCTTCAGCACGCGCTCCACG 3' (SEQ ID 
NO:237) 

The resulting synthetic gene fragment was cloned into a pRAM vector using 
Ncoldin&Xbal. Two clones having the correct size insert were sequenced. Four to 
20 six mutations were found in the synthetic gene from each clone. These mutations 
were fixed by site-directed mutagenesis (Gene Editor from Promega Corp., 
Madison, WI) and swapping the correct regions between these two genes. The 
corrected gene was confirmed by sequencing. 

25 Other Vectors 

To prepare an expression vector for the synthetic Renilla luciferase gene in a 
pGL-3 control vector backbone, 5 |ag of pGL3-control was digested with Nco I and 
Xba I in 50 |al final volume with 2 |ll1 of each enzyme and 5 jal lOX buffer B 
(nanopure water was used to fill the volume to 50 |al). The digestion reaction was 

30 incubated at 37°C for 2 hours, and the whole mixture was run on a 1% agarose gel 
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in IXTAE. The desired vector backbone fragment was purified using Qiagen's 
QIAquick gel extraction kit. 

The native Renilla luciferase gene fragment was cloned into pGL3-control 
vector using two ohgonucleotides, Nco I-RL-F and^a I-RL-R, to PGR ampUfy 
5 native Renilla luciferase gene using pRL-CMV as the template. The sequence for 
Nco I-RL-F is 5'- CGCTAGCCATGGCTTCGAAAGTTTATGATCC -3' (SEQ ID 
NO:238); the sequence foxXba I-RL-R is 

5' GGCCAGTAACTCTAGAATTATTGTT-3' (SEQ ID NO:239). The PGR 
reaction was carried out as follows: 
10 Reaction mixture (for 100 \x\)\ 

DNA template (Plasmid) 1 .0 ^1 (1.0 ng/|Lil final) 

10 X Rec. Buffer 10.0 ^il (Stratagene Corp.) 

1 5 dNTPs (25 mM each) 1 .0 ^1 (final 250 |iM) 

Primer 1 (10 jaM) 2.0 ^il (0.2 |liM final) 

Primer 2 (1 0 |aM) 2.0 \x\ (0.2 \iM final) 

20 

Pfu DNA Polymerase 2.0 |al (2.5 U/|li1, Stratagene Gorp.) 

82.0 )il double distilled water 

25 PGR Reaction: heat 94°G for 2 minutes; (94°C for 20 seconds; 

65°G for 1 minute; 72°G for 2 minutes; then 72°G for 5 minutes) x 25 cycles, then 

incubate on ice. The PGR amplified fragment was cut from a gel, and the DNA 

purified and stored at -20°G. 

To introduce native Renilla luciferase gene fragment into pGL3-control 
30 vector, 5 |ag of the PGR product of the native Renilla luciferase gene (RAM-RL- 

synthetic) was digested with Nco I m&Xba L The desired Renilla luciferase gene 

fragment was purified and stored at -20^G. 

Then 100 ng of insert and 100 ng of pGL3 -control vector backbone were 

digested with restriction enzymes Nco I mAXba I and ligated together. Then 2 \A of 
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the ligation mixture was transformed into JM109 competent cells. Eight ampicillin 
resistance clones were picked and their DNA isolated. DNA from each positive 
clone of pGL3-control-native and pGL3 -control-synthetic was purified. The correct 
sequences for the native gene and the sjnithetic gene in the vectors were confirmed 

5 by DNA sequencing. 

To determine whether the synthetic Renilla luciferase gene has improved 
expression in mammalian cells, the gene was cloned into the mammahan expression 
vector pGL3-control vector under the control of SV40 promoter and SV40 early 
enhancer (Fig. 13 A). The native Renilla luciferase gene was also cloned into the 

10 pGL-3 control vector so that the expression from synthetic gene and the native gene 
could be compared. The expression vectors were then transfected into four common 
mammalian cell lines (CHO, NIH3T3, Hela and CV-1; Table 10), and the 
expression levels compared between the vectors with the synthetic gene versus the 
native gene. The amount of DNA used was at two different levels to ascertain that 

15 expression from the synthetic gene is consistently increased at different expression 
levels. The results show a 70-600 fold increase of expression for the synthetic 
Renilla luciferase gene in these cells (Table 10). 



Table 10 

20 Enhanced Svnthetic Renilla Gene Expression 

Cell Type Amount Vector Fold Expression Increase 

CHO 0.2 [ig 142 

2.8 ^g 145 

NIH3T3 0.2 ^ig 326 

2.0 ^ig 593 

HeLa 0.2 ^g 185 

1.0 ^g 103 

CV-1 0.2 ^g 68 

2.0 ^ig 72 
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One important advantage of luciferase reporter is its short protein half-life. 
The enhanced expression could also result from extended protein half-life and, if so, 
this gives an imdesired disadvantage of the new gene. This possibility is ruled out 
by a cycloheximide chase ("CHX Chase") experiment (Figure 14), which 
5 demonstrated that there was no increase of protein half-life resulted from the 
humanized Renilla luciferase gene. 

To ensure that the increase in expression is not limited to one expression 
vector backbone, is promoter specific and/or cell specific, a synthetic Renilla gene 
(Rluc-final) as well as native Renilla gene were cloned into different vector 
10 backbones and under different promoters (Figure 13B). The synthetic gene always 
exhibited increased expression compared to its wild-type counterpart (Table 11). 

Table 11 

Renilla Gene Expression: native v. synthetic (Rluc-final) 



Vector 


NIH-3T3 


HeLa 


CHO 


pRL-tk, native 


3,834.6 


922.4 


7,671.9 


pRL-tk, synthetic 


13,252.5 


9,040.2 


41,743.5 


pRL-CMV, native 


168,062.2 


842,482.5 


153,539.5 


pRL-CMV, synthetic 


2,168,129 


8,440,306 


2,532,576 


pRL-SV40, native 


224,224.4 


346,787.6 


85,323.6 


pRL-SV40, synthetic 


1,469,588 


2,632,510 


1,422,830 


pRL-nuU, native 


2,853.8 


431.7 


2,434 


pRL-nuU, synthetic 


9,151.17 


2,439 


28,317.1 


pRGLSb, native 


12 


21.8 


17 


pRGL3b, synthetic 


130.5 


212.4 


1,094.5 


pRGL3-tk, native 


27.9 


155.5 


186.4 


pRGL3-tk, synthetic 


6,778.2 


8,782.5 


9,685.9 
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pRL-tk no intron, native 31.8 165 93.4 

pRL-tk no intron, synthetic 6,665.5 6,379 21,433.1 



Table 12 

Renilla Luciferase Expression in Mammalian Cells 







Percent of control vector 


Vector 


CHO cells 


Nffl3T3 cells 


HeLa cells 


pRL-control native 


100 


100 


100 


pRL-control synthetic 


100 


100 


100 


pRL-basic native 


4.1 


5.6 


0.2 


pRL-basic synthetic 


0.4 


0.1 


0.0 


pRL-promoter native 


5.9 


7.8 


0.6 


pRL-promoter synthetic 


15.0 


9.9 


1.1 


pRL-enhancer native 


42.1 


123.9 


52.7 


pRL-enhancer synthetic 


2.6 


1.5 


5.4 



5 (Vector backbones illustrated in Figure 1 3 A) 

With reduced spurious expression the synthetic gene should exhibit less 
basal level transcription in a promoterless vector. The synthetic and native Renilla 
luciferase genes were cloned into the pGL3 -basic vector to compare the basal level 
of transcription. Because the synthetic gene itself has increased expression 

10 efficiency, the activity from the promoterless vector cannot be compared directly to 
judge the difference in basal transcription, rather, this is taken into consideration by 
comparing the percentage of activity from the promoterless vector in reference to 
the control vector (expression from the basic vector divided by the expression in the 
fully functional expression vector with both promoter and enhancer elements). The 

15 data demonstrate that the synthetic Renilla luciferase has a lower level of basal 
transcription than the native gene (Table 12) 
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It is well known to those skilled in the art that an enhancer can substantially 
stimulate promoter activity. To test whether the synthetic gene has reduced risk of 
inappropriate transcriptional characteristics, the native and synthetic gene were 
introduced into a vector with an enhancer element (pGL3 -enhancer vector). 

5 Because the synthetic gene has higher expression efficiency, the activity of both 
cannot be compared directly to compare the level of transcription in the presence of 
the enhancer, however, this is taken into account by using the percentage of activity 
from enhancer vector in reference to the control vector (expression in the presence 
of enhancer divided by the expression in the fully functional expression vector with 

10 both promoter and enhancer elements). Such results show that when native gene is 
present, the enhancer alone is able to stimulate transcription from 42-124% of the 
control, however, when the native gene is replaced by the synthetic gene in the same 
vector, the activity only constitutes 1-5% of the value when the same enhancer and a 
strong SV40 promoter are employed. This clearly demonstrates that synthetic gene 

15 has reduced risk of spurious expression (Table 12). 

The synthetic Renilla gene (Rluc-final) was used in in vitro systems to 
compare translation efficiency with the native gene. In a T7 quick coupled 
transcription/translation system (Promega Corp., Madison, WI), pRL-null native 
plasmid (having the native Renilla luciferase gene under the control of the T7 

20 promoter) or the same amount of pRL-nuU-synthetic plasmid (having the synthetic 
Renilla luciferase gene under the control of the T7 promoter) was added to the TNT 
reaction mixture and luciferase activity measured every 5 minutes up to 60 minutes. 
Dual Luciferase assay kit (Promega Corp.) was used to measure Renilla luciferase 
activity. The data showed that improved expression was obtained from the synthetic 

25 gene (Figure 15A,B). To ftirther evidence the increased translation efficiency of the 
synthetic gene, RNA was prepared by an in vitro transcription system, then purified. 
pRL-null (native or synthetic) vectors were linearized with BamH I. The DNA was 
purified by multiple phenol-chloroform extraction followed by ethanol precipitation. 
An in vitro T7 transcription system was employed by prepare RNAs. The DNA 

30 template was removed by using RNase-free DNase, and RNA was purified by 
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phenol-chloroform extraction followed by multiple isopropanol precipitations. The 
same amount of purified RNA, either for the synthetic gene or the native gene, was 
then added to a rabbit reticulocyte lysate (Figure 15 C, D) or wheat germ lysate 
(Figure 15 E, F). Again, the synthetic Renilla luciferase gene RNA produced more 

5 luciferase than the native one. These data suggest that the translation efficiency is 
improved by the synthetic sequence. To determine why the synthetic gene was 
highly expressed in wheat germ, plant codon usage was determined. The lowest 
usage codons in higher plants coincided with those in mammals. 

Reporter gene assays are widely used to study transcriptional regulation 

10 events. This is often carried out in co-transfection experiments, in which, along 

with the primary reporter construct containing the testing promoter, a second control 
reporter under a constitutive promoter is transfected into cells as an internal control 
to normalize experimental variations including transfection efficiencies between the 
samples. Control reporter signal, potential promoter cross talk between the control 

1 5 reporter and primary reporter, as well as potential regulation of the control reporter 
by experimental conditions, are important aspects to consider for selecting a reliable 
co-reporter vector. 

As described above, vector constructs were made by cloning synthetic 
Renilla luciferase gene into different vector backbones under different promoters. 

20 All the constructs showed higher expression in the three mammalian cell hues tested 
(Table 11). Thus, with better expression efficiency, the synthetic Renilla luciferase 
gives out higher signal when transfected into mammalian cells. 

Because a higher signal is obtained, less promoter activity is required to 
achieve the same reporter signal, this reduced risk of promoter interference, CHO 

25 cells were transfected with 50 ng pGL3-control (firefly luc+) plus one of 5 different 
amounts of native pRL-TK plasmid (50, 100, 500, 1000, or 2000 ng) or synthetic 
pRL-TK (5, 10, 50, 100, or 200 ng). To each transfection, pUC19 carrier DNA was 
added to a total of 3 [ig DNA. Shown in Figure 16 is the experiment demonstrating 
that 10 fold less pRL-TK DNA gives similar or more signal as the native gene, with 

30 reduced risk of inhibiting expression ft"om the primary reporter pGL3 -control. 
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Experimental treatment sometimes may activate cryptic sites within the gene 
and cause induction or suppression of the co-reporter expression, which would 
compromise its function as co-reporter for normalization of transfection efficiencies. 
One example is that TPA induces expression of co-reporter vectors harboring the 

5 wild-type gene when transfecting MCF-7 cells. 500 ng pRL-TK (native), 5 |ig 
native and synthetic pRG-B, 2.5 \xg native and synthetic pRG-TK were transfected 
per well of MCF-7 cells. 100 ng/well pGL3-control (firefly luc+) was co- 
transfected with all RL plasmids. Carrier DNA, pUC19, was used to bring the total 
DNA transfected to 5.1 |ag/well. 15.3 |al TransFast Transfection Reagent (Promega 

10 Corp., Madison, WI) was added per well. Sixteen hours later, cells were 

trypsinized, pooled and split into six wells of a 6-well dish and allowed to attach to 
the well for 8 hours. Three wells were then treated with the 0.2 nM of the tumor 
promoter, TPA (phorbol-12-myristate- 13 -acetate, Calbiochem #524400-S), and 
three wells were mock treated with 20 (al DMSO. Cells were harvested with 0.4 ml 

1 5 Passive Lysis Buffer 24 hours post TPA addition. The results showed that by using 
the synthetic gene, undesirable change of co-reporter expression by experimental 
stimuli can be avoided (Table 13). This demonstrates that using synthetic gene can 
reduce the risk of anomalous expression. 

Table 13 

20 TPA Induction 

Vector Rlu Fold Induction 

pRL-tk untreated (native) 1 84 

pRL-tk TPA treated (native) 812 4.4 
pRG-B untreated (native) 1 
pRG-B TPA treated (native) 8 8.0 
pRG-B untreated (final) 132 
pRG-B TPA treated (final) 1 95 1 .47 
pRG-tk untreated (native) 44 
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Vector 

pRG-tk TPA treated (native) 
pRG-tk untreated (final) 
pRG-tk TPA treated (final) 



Rlu Fold Induction 

192 4.36 

12,816 

11,347 0.88 
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WHAT IS CLAIMED IS: 



1 . A synthetic nucleic acid molecule comprising at least 300 nucleotides of a 
coding region for a polypeptide, having a codon composition differing at 
more than 25% of the codons from a wild type nucleic acid sequence 
encoding a polypeptide, and having at least 3 -fold fewer transcription 
regulatory sequences relative to the average number of such sequences 
resulting from random selections of codons at the codons which differ, 
wherein the transcription regulatory sequences are selected from the group 
consisting of transcription factor binding sequences, intron splice sites, 
poly(A) addition sites and promoter sequences, and wherein the polypeptide 
encoded by the synthetic nucleic acid molecule has at least 85% sequence 
identity to the polypeptide encoded by the wild type nucleic acid sequence. 

2. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has at least 5-fold fewer transcription regulatory sequences. 

3. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 35% of the codons. 

4. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 45% of the codons. 

5. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild type 
nucleic acid sequence at more than 55%) of the codons. 
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6. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ are ones that are preferred codons of a desired host cell 

7. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule encodes a reporter molecule. 

8. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule encodes a selectable marker protein. 

9. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule encodes a luciferase. 

10. The synthetic nucleic acid molecule of claim 9 wherein the wild type nucleic 
acid sequence encodes aRenilla luciferase. 

1 1 . The synthetic nucleic acid molecule of claim 9 wherein the wild type nucleic 
acid sequence encodes a beetle luciferase. 

12. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid valine at position 224. 

13. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid histidine at position 224, 
histidine at position 247, isoleucine at position 346, glutamine at position 
348, or any combination thereof. 

14. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those which 
are employed more frequently in mammals. 
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15. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those which 
are preferred codons in humans. 

16. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those which 
are preferred codons in plants. 

17. The synthetic nucleic acid molecule of claim 9 wherein the synthetic nucleic 
acid molecule comprises SEQ ID N0:21 (Rlucver2) or SEQ ID NO:22 
(Rluc-fmal). 

18. The synthetic nucleic acid molecule of claim 9 wherein the synthetic nucleic 
acid molecule comprises SEQ ID N0:7 (GRver5), SEQ ID N0:8 (GRver6), 
SEQ ID N0:9 (GRver5.1), or SEQ ID NO:297 (GRverS.l). 

19. The synthetic nucleic acid molecule of claim 9 wherein the synthetic nucleic 
acid molecule comprises SEQ ID NO: 14 (RDverS), SEQ ID NO: 15 
(RDver7), SEQ ID N0:16 (RDver5.1), SEQ ID NO:299 (RDverS. 1), SEQ 
ID NO:17 (RDver5.2), SEQ ID N0:18 (RD156-1H9) or SEQ ID NO:301 
(RD156-1H9). 

20. The synthetic nucleic acid molecule of claim 15 wherein the majority of 
codons which differ are the human codons CGC, CTG, TCT, AGC, ACC, 
CCA, CCT, GCC, GGC, GTG, ATC, ATT, AAG, AAC, CAG, CAC, GAG, 
GAC, TAC, TGC and TTC. 

2 1 . The synthetic nucleic acid molecule of claim 1 5 wherein the maj ority of 
codons which differ are the human codons CGC, CTG, TCT, ACC, CCA, 
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GCC, GGC, GTC, and ATC or codons CGT, TTG, AGC, ACT, CCT, GCT, 
GGT, GTGandATT. 



22. The synthetic nucleic acid molecule of claim 16 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TCT, TCC, ACC, 
CCA, CCT, GCT, GGA, GTG, ATC, ATT, AAG, AAC, CAA, CAC, GAG, 
GAC, TAC, TGC and TTC. 

23. The synthetic nucleic acid molecule of claim 16 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TCT, ACC, CCA, 
GTC, GGA, GTC, and ATC or codons CGT, TGG, AGC, ACT, CCT, 
GCC, GGT, GTG and ATT. 

24. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule is expressed in a mammahan host cell at a level which is 
greater than that of the wild typo nucleic acid sequence. 

25. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of CTG or TTG leucine-encoding 
codons. 

26. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of GTG or GTC valine-encoding 
codons. 

27. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of GGC or GGT glycine-encoding 
codons. 
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28, The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule an increased number of ATC or ATT isoleucine-encoding 
codons. 



29. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of CCA or CCT pro line-encoding 
codons. 



30, The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of CGC or CGT arginine-encoding 
codons. 



31. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of AGC or TCT serine-encoding 
codons. 



32. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of ACC or ACT threonine-encoding 
codons. 



33. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule has an increased number of GCC or GCT alanine-encoding 
codons. 



34. The synthetic nucleic acid molecule of claim 1 wherein the codons in the 
synthetic nucleic acid molecule which differ encode the same amino acids as 
the corresponding codons in the wild type nucleic acid sequence. 

35. A plasmid comprising the synthetic nucleic acid molecule of claim 1 . 
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36. An expression vector comprising the synthetic nucleic acid molecule of 
claim 1 linked to a promoter functional in a cell 

37. The expression vector of claim 36 wherein the synthetic nucleic acid 
molecule is operatively linked to a Kozak consensus sequence, 

38. The expression vector of claim 36 wherein the promoter is functional in a 
mammalian cell. 

39. The expression vector of claim 36 wherein the promoter is functional in a 
human cell 

40. The expression vector of claim 36 wherein the promoter is functional in a 
plant cell. 

41 . The expression vector of claim 36 wherein the expression vector further 
comprises a multiple cloning site. 

42. The expression vector of claim 41 wherein the expression vector comprises a 
multiple cloning site positioned between the promoter and the synthetic 
nucleic acid molecule. 

43. The expression vector of claim 41 wherein the expression vector comprises a 
multiple cloning site positioned downstream from the synthetic nucleic acid 
molecule. 

44. A host cell comprising the expression vector of claim 36. 

45. A reporter gene expression kit comprising, in suitable container means, the 
expression vector of claim 36. 
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46. An isolated polypeptide encoded by SEQ ID N0:9 (GRver5.1) or SEQ ID 
N0:18(RD156-1H9). 

47, A polynucleotide which hybridizes under stringent hybridization conditions 
to SEQ ID NO:22 (Rluc-fmal), SEQ ID N0:9 (GRverS.l), SEQ ID N0:18 
(RD156-1H9), SEQ ID NO:297 (GRverS.l), SEQ ID NO:301 (RD156-1H9), 
or the complement thereof. 

48, A method to prepare a synthetic nucleic acid molecule comprising an open 
reading frame, comprising: 

a) altering a pluraUty of transcription regulatory sequences in a parent 
nucleic acid sequence which encodes a polypeptide having at least 100 
amino acids to yield a synthetic nucleic acid molecule which has at least 3- 
fold fewer transcription regulatory sequences relative to the parent nucleic 
acid sequence, wherein the transcription regulatory sequences are selected 
from the group consisting of transcription factor binding sequences, intron 
splice sites, poly(A) addition sites, enhancer sequences and promoter 
sequences; and 

b) altering greater than 25% of the codons in the synthetic nucleic acid 
sequence which has a decreased number of transcription regulatory 
sequences to yield a further synthetic nucleic acid molecule, wherein the 
codons which are altered do not result in an increased number of 
transcription regulatory sequences, wherein the fiirther synthetic nucleic acid 
molecule encodes a polypeptide with at least 85% amino acid sequence 
identity to the polypeptide encoded by the parent nucleic acid sequence. 

49. A method to prepare a synthetic nucleic acid molecule comprising an open 
reading frame, comprising: 
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a) altering greater than 25% of the codons in a parent nucleic acid sequence 
which encodes a polypeptide having at least 100 amino acids to yield a 
codon-altered synthetic nucleic acid molecule, and 

b) altering a plurality of transcription regulatory sequences in the codon- 
altered synthetic nucleic acid molecule to yield a further synthetic nucleic 
acid molecule which has at least 3-fold fewer transcription regulatory 
sequences relative to a synthetic nucleic acid molecule with a random 
selection of codons at the codons which differ, wherein the transcription 
regulatory sequences are selected from the group consisting of transcription 
factor binding sequences, intron splice sites, poly(A) addition sites, enhancer 
sequences and promoter sequences, and wherein the further synthetic nucleic 
acid molecule encodes a polypeptide with at least 85% amino acid sequence 
identity to the polypeptide encoded by the parent nucleic acid sequence. 

50. The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a reporter molecule. 

5 1 . The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a luciferase. 

52. The method of claim 48 or 49 wherein the synthetic nucleic acid molecule 
hybridizes under medium stringency hybridization conditions to the parent 
nucleic acid sequence. 

53. The method of claim 48 or 49 wherein the codons which are altered encode 
the same amino acid as the corresponding codons in the parent nucleic acid 
sequence. 

54. A synthetic nucleic acid molecule which is the further synthetic nucleic acid 
molecule prepared by the method of claim 48 or 49. 
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55. A method for preparing at least two synthetic nucleic acid molecules which 
are codon distinct versions of a parent nucleic acid sequence which encodes 
a polypeptide, comprising: 

a) altering a parent nucleic acid sequence to yield a synthetic nucleic 
acid molecule having an increased number of a first plurality of codons 
that are employed more frequently in a selected host cell relative to the 
number of those codons in the parent nucleic acid sequence; and 

b) altering the parent nucleic acid sequence to yield a frirther synthetic 
nucleic acid molecule having an increased number of a second plurality of 
codons that are employed more frequently in the host cell relative to the 
number of those codons in the parent nucleic acid sequence, wherein the first 
plurahty of codons is different than the second pluraUty of codons, and 
wherein the synthetic and the further synthetic nucleic acid molecules 
encode the same polypeptide. 



56. The method of claim 55 farther comprising altering a plurality of 
transcription regulatory sequences in the synthetic nucleic acid molecule, the 
further synthetic nucleic acid molecule, or both, to yield at least one yet 
further synthetic nucleic acid molecule which has at least 3-fold fewer 
transcription regulatory sequences relative to the synthetic nucleic acid 
molecule, the further synthetic nucleic acid molecule, or both. 

57. The method of claim 55 further comprising altering at least one codon in the 
first synthetic sequence to yield a first modified synthetic sequence 

which encodes a polypeptide with at least one amino acid substitution 
relative to the polypeptide encoded by the first synthetic nucleic acid 
sequence. 
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58. The method of claim 56 further comprising altering at least one codon in the 
second synthetic sequence to yield a second modified synthetic sequence 
which encodes a polypeptide with at least one amino acid substitution 
relative to the polypeptide encoded by the first synthetic nucleic acid 
sequence. 

59. The method of claim 55 wherein the synthetic sequences encode a luciferase, 

60. The synthetic nucleic acid molecule of claim 1 wherein the synthetic nucleic 
acid molecule is expressed at a level which is at least 110% of that of 

the wild type nucleic acid sequence in a cell or cell extract under identical 
conditions. 

61 . The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule has at least 90% contiguous 
sequence identity to the polypeptide encoded by the wild type nucleic acid 
sequence. 

62. The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule is identical in amino acid 
sequence to the polypeptide encoded by the wild type nucleic acid sequence. 

63. A vector comprising a synthetic nucleic acid molecule having at least 3 -fold 
fewer transcriptional regulatory sequences relative to a vector comprising a 
parent nucleic acid sequence, wherein the transcription regulatory sequences 
are selected from the group consisting of transcription factor binding 
sequences, intron splice sites, poly(A) addition sites and promoter sequences. 

64. The vector of claim 63 wherein the synthetic nucleic acid molecule does not 
encode a polypeptide. 
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65. The method of claim 48 or 49 further comprising altering the further 
synthetic nucleic acid molecule to encode a polypeptide having at least one 
amino acid substitution relative to the polypeptide encoded by the parent 
nucleic acid sequence. 

66. The method of claim 48 or 49 wherein the altering of transcription regulatory 
sequences does not introduce amino acid substitutions to the polypeptide 
encoded by the synthetic nucleic acid molecule. 
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Abstract of the Disclosure 

A method to prepare synthetic nucleic acid molecules having reduced 
inappropriate or unintended transcriptional characteristics when expressed in a 
particular host cell. 
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Figure 2 (cont.) 
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figure 3 
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Figure 3 (cont.) 
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Figure 3 (cont,) 
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Codon Usage Analysis 



p<r 542 co<l<»flS 





YG#8t-60 


vcf I OR 


vcfl RD 


v<r5GR 


v«5 RD 


HUM 


CGA 


7 


0 


0 


2 


0 


3 


CGC 


1 


13 


13 


11 


12 


6 


CGG 


0 


0 


0 


0 


0 


6 


CGT 


5 


13 


1 3 


1 3 


14 


3 


AOA 


ft 


e\ 
V 


u 


0 


0 


5 


x<^ff A fin 


7 




Q 




f\ 

V 




1 A 


c 

J 


ft 
V 


u 


/» 

W 






/•"TV"' 




V 


f 


XI 


f 1 


1 1 


C lU 




xJS 


X/ 


10 
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C I I 


f ^ 
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V 


1 


1 


6 


1 lA 


1 T 
1 / 


y 


V 


0 


0 


3 




1 

1 J 


X/ 


X / 




25 


>; 


ICA 


<> 


rt 
U 


0 


t 


2 


< 
5 


ICC 


2 


0 


0 


4 


2 


10 




"] 
t 


(5 


C\ 


u 


0 


•) 

X 


TCT 


1 


(6 


1 5 
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1 X 


7 


A<JV- 




i J 


1 J 


14 


12 


to 


iCl AO I 


1 


A 
V 


u 


1 


2 


c 


AV_A 


\\J 


(J 


rt 
u 


u 


* 


« 
0 


AK-v- 


2 




1 1 


c 


1 1 


12 


ACVJ 


JL 


^ 


u 


U 


u 


j( 


Thf ACT 


% 


1 1 


1 1 


14 


10 


7 


dCA 


V 


( 4 


14 


9 


12 
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0 


0 


2 




1 1 


CCG 


* 


0 


0 


0 


0 


4 


Pro CCT 


9 


14 


M 


1 7 


15 


8 


GCA 


14 


0 


0 


5 
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S 
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4 


f9 
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12 


16 


GCG 


5 


0 


0 


0 


0 


4 


Ala OCT 


15 


18 


19 


IS 
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CGA 
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13 


14 


12 
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I J 


I J 
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12 
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t AC 
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12 
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14 
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10 


13 
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ATA 


12 


0 
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ATC 


7 


19 


19 


23 


20 


13 


ricATT 


19 


19 


20 


15 


19 


% 


Met ATG 


U 


n 


n 


n 


li 


12 


TfpTGG 


2 


2 


2 


2 


2 


7 



f«U<(\^ co4<Mi ttfagc for eaicb ft* <* l<W) 

YG#8I^ vcfSOR vcrSRD HUM 



CGA 


27 


0 




IV 




4 




46 


21 


V.VAJ 


u 


(J 


0 


(9 




It 


50 


54 


9 


A/^A 
AvjA 


xJ 


Q 


0 


19 
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X/ 
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21 


PTA 
C 1 A 




0 


0 


6 


C iC 
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22 


20 


21 




7 


35 


33 


44 


/TT 


XX 
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2 


1 1 


TTA 
1 lA 
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0 


6 


Leu TTG 


XS 


^2 


45 


f 1 


Tr/~x 
iCA 




3 


7 


13 


4C^- 




(3 


7 


25 


TCG 


23 


0 


0 


6 


TCT 


23 


35 


40 


(8 


avjv. 




45 


40 


26 


Scf AGT 


23 


3 


7 


13 


ACA 


45 


0 


5 


25 


aTP 


9 


36 


50 


40 


aCC, 


9 


0 


0 


12 


TKr APT 

1 nr Av. t 


3-6 


64 


45 


22 


ppa 

CCA 


32 


3^ 


43 


26 


CCv. 


29 


7 


4 


35 


ccc 
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0 


12 


I ro CC t 


Jx 


61 


54 


27 


OCA 


yi 


13 


1 1 


19 


CrCC 


f 1 


37 


32 


40 


CiCO 


1 1 
1 J 


Q 


0 


10 




39 


47 


55 


27 


GGA 


4^ 


3 


8 


24 




{ 


54 


54 


36 


GGG 


5 


3 


3 


25 


City ovj 1 


^ [ 


4 1 


36 


16 


r;TA 

0 1 A 


27 


2 


2 


9 


0 iC 




42 


53 


25 


GTG 


24 
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35 


48 


V*l riTT 
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66 
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39 


f v« A A/^ 
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34 


46 


63 
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AAC 
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59 


57 


58 


AcTt A AT 
ASn A A 1 


73 


41 


43 


43 


CAA 


57 


79 


47 


25 


ran PAn 


43 


21 


53 


76 


CAC 


46 


54 


31 


59 


Hi« PaT 


54 


46 


69 


39 


GAA 


68 


50 


47 


39 




jX 


SO 


53 


61 


GaC 


23 


54 


46 


56 




77 


46 


54 


42 


TAP 
I AV- 


42 


63 


65 


60 


Tvr TAT 


58 


37 


35 


40 


1 VA_ 


27 


27 


36 


60 


CysTGT 


73 


73 


64 


41 


TTC 


44 


60 


48 


58 


PhcTTT 


56 


40 


52 


41 


ATA 


32 
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13 


ATC 


18 


61 


51 


55 


KcATT 


50 


39 


49 


34 


Met ATG 


100 


100 


100 


too 


TtpTGG 


100 


100 


100 


100 



Figure 5A 

Codon Usage YG#81-6G01 (yellow-green) 



TTT 


Phe 


14 


TCT 


Ser 


7 


TTC 


Phe 


11 


TCC 


Ser 


2 


TTA 


Leu 


17 


TCA 


Ser 


6 


TTG 


Leu 


13 


TCG 


Ser 


7 


CTT 


Leu 


12 


CCT 


Pro 


9 


CTC 


Leu 


4 


CCC 


Pro 


8 


CTA 


Leu 


5 


CCA 


Pro 


9 


CTG 


Leu 


4 


CCG 


Pro 


2 


ATT 


He 


19 


ACT 


Thr 


8 


ATC 


He 


7 


ACC 


Thr 


2 


ATA 


He 


12 


ACA 


Thr 


10 


ATG 


Met 


11 


ACG 


Thr 


2 


GTT 


Val 


20 


GCT 


Ala 


15 


GTC 


Val 


4 


GCC 


Ala 


4 


GTA 


Val 


13 


GCA 


Ala 


14 


GTG 


Val 


12 


GCG 


Ala 


5 



TAT 


Tyr 


11 


TGT 


Cys 


8 


TAG 


Tyr 


8 


TGC 


Cys 


3 


TAA 




0 


TGA 




0 


TAG 


** * 


0 


TGG 


Trp 


2 


CAT 


His 


7 


CGT 


Arg 


5 


CAC 


His 


6 


CGC 


Arg 


1 


CAA 


Gin 


8 


CGA 


Arg 


7 


CAG 


Gin 


6 


CGG 


Arg 


0 


AAT 


Asn 


16 


AGT 


Ser 


7 


AAC 


Asn 


6 


AGC 


Ser 


2 


AAA 


Lys 


23 


AGA 


Arg 


6 


AAG 


Lys 


12 


AGG 


Arg 


7 


GAT 


Asp 


20 


GGT 


Gly 


16 


GAC 


Asp 


6 


GGC 


Gly 


3 


GAA 


Glu 


26 


GGA 


Gly 


18 


GAG 


Glu 


12 


GGG 


Gly 


2 



Figure 5B 



Codon Usage: GRverl 



TTT 


Phe 


12 


TCT 


Ser 


TTC 


Phe 


13 


TCC 


Ser 


TTA 


Leu 


0 


TCA 


Ser 


TTG 


Leu 


27 


TOG 


Ser 


CTT 


Leu 


0 


CCT 


Pro 


CTC 


Leu 


0 


CCC 


Pro 


CTA 


Leu 


0 


CCA 


Pro 


CTG 


Leu 


28 


CCG 


Pro 


ATT 


He 


19 


ACT 


Thr 


ATC 


He 


19 


ACC 


Thr 


ATA 


He 


0 


ACA 


Thr 


ATG 


Met 


11 


ACG 


Thr 


GTT 


Val 


0 


GCT 


Ala 


GTC 


Val 


25 


GCC 


Ala 


GTA 


Val 


0 


GCA 


Ala 


GTG 


Val 


25 


GCG 


Ala 



16 


TAT 


Tyr 


9 


TGT 


Cys 


5 


0 


TAC 


Tyr 


10 


TGC 


Cys 


6 


0 


TAA 


*** 


0 


TGA 


★ ** 


0 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


14 


CAT 


His 


6 


CGT 


Arg 


13 


0 


CAC 


His 


7 


CGC 


Arg 


13 


14 


CAA 


Gin 


7 


CGA 


Arg 


0 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


11 


AAT 


Asn 


11 


AGT 


Ser 


0 


11 


AAC 


Asn 


11 


AGC 


Ser 


15 


0 


AAA 


Lys 


17 


AGA 


Arg 


0 


0 


AAG 


Lys 


18 


AGG 


Arg 


0 


18 


GAT 


Asp 


13 


GGT 


Gly 


19 


19 


GAC 


Asp 


13 


GGC 


Gly 


20 


0 


GAA 


Glu 


19 


GGA 


Gly 


0 


0 


GAG 


Glu 


19 


GGG 


Gly 


0 



Figure 5C 



Codon Usage: RDverl 



TTT 


Phe 


13 


TCT 


Seir 


15 


TAT 




10 


TGT 


Cys 


6 


TTC 


Phe 


12 


TCC 


Ser 


0 


TAC 


Tvr 

xy X 


10 


TGC 


Cys 


5 


TTA 


Leu 


0 


TCA 


Ser 


0 


TAA 


*** 


0 


TGA 


* * *■ 


0 


TTG 


Leu 


27 


TCG 


Ser 


0 
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** * 


0 


TGG 


Trp 


2 




Leu 


0 


OCT 


Pro 


14 


CAT 


His 


7 


CGT 


Arg 


13 




T»^i 1 
i-tC u. 




CCC 


Pro 


Q 


CAC 


His 




CGC 




13 


CTA 


Leu 


0 


CCA 


Pro 


14 


CAA 


Gin 


8 


CGA 


Arcr 


0 


CTG 


Leu 


27 


CCG 


Pro 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


ATT 


He 


20 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 


0 


ATC 


He 


19 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


15 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


18 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


17 


AGG 


Arg 


0 


GTT 


Val 


0 


GCT 


Ala 


19 


GAT 


Asp 


13 


GGT 


Gly 


20 


GTC 


Val 


24 


GCC 


Ala 


18 


GAC 


Asp 


13 


GGC 


Gly 


19 


GTA 


Val 


0 


GCA 


Ala 


0 


GAA 


Glu 


19 


GGA 


Gly 


0 


GTG 


Val 


25 


GCG 


Ala 


0 


GAG 


Glu 


19 


GGG 


Gly 


0 



Figure 5D 



Codon Usage : Grver2 



TTT 


Phe 


12 


TCT 


Ser 


15 


TAT 


Tyr 


9 


TGT 


Cys 


5 


TTC 


Phe 


13 


TCC 


Ser 


0 


TAC 


Tyr 


10 


TGC 


Cys 


6 


TTA 


Leu 


0 


TCA 


Ser 


0 


TAA 


* * * 


0 


TGA 


* * * 


0 


TTG 


Leu 


27 


TCG 


Ser 


0 


TAG 


* ** 


0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


14 


CAT 


His 


6 


CGT 


Arg 


13 


CTC 


Leu 


0 


CCC 


Pro 


0 


CAC 


His 


7 


CGC 


Arg 


13 


CTA 


Leu 


0 


CCA 


Pro 


14 


CAA 


Gin 


10 


CGA 


Arg 


0 


CTG 


Leu 


28 


CCG 


Pro 


0 


CAG 


Gin 


4 


CGG 


Arg 


0 


ATT 


He 


20 


ACT 


Thr 


11 


AAT 


Asn 


11 


AGT 


Ser 


0 


ATC 


He 


18 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


16 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


16 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


19 


AGG 


Arg 


0 


GTT 


Val 


0 


GCT 


Ala 


18 


GAT 


Asp 


13 


GGT 


Gly 


18 


GTC 


Val 


28 


GCC 


Ala 


19 


GAC 


Asp 


13 


GGC 


Gly 


21 


GTA 


Val 


0 


GCA 


Ala 


0 


GAA 


Glu 


17 


GGA 


Gly 


0 


GTG 


Val 


22 


GCG 


Ala 


0 


GAG 


Glu 


21 


GGG 


Gly 


0 



Figure 5E 



Codon Usage:Rdver2 



TTT 


Phe 


13 


TCT 


Ser 


16 


TAT 


Tyr 


10 


TGT 


Cys 


6 


TTC 


Phe 


12 


TCC 


Ser 


0 


TAC 


Tyr 


10 


TGC 


Cys 


5 


TTA 


Leu 


0 


TCA 


Ser 


0 


TAA 


*** 


0 


TGA 


*** 


0 


TTG 


Leu 


27 


TCG 


Ser 


0 


TAG 




0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


15 


CAT 


His 


7 


CGT 


Arg 


13 


CTC 


Leu 


1 


ccc 


Pro 


0 


CAC 


His 


6 


CGC 


Arg 


13 


CTA 


Leu 


0 


CCA 


Pro 


13 


CAA 


Gin 


8 


CGA 


Arg 


0 


CTG 


Leu 


27 


CCG 


Pro 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


ATT 


He 


19 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 


0 


ATC 


He 


20 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


19 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


16 


AGG 


Arg 


0 


GTT 


Val 


0 


GCT 


Ala 


19 


GAT 


Asp 


13 


GGT 


Gly 


21 


GTC 


Val 


21 


GCC 


Ala 


17 


GAC 


Asp 


13 


GGC 


Gly 


18 


GTA 


Val 


0 


GCA 


Ala 


1 


GAA 


Glu 


21 


GGA 


Gly 


0 


GTG 


Val 


28 


GCG 


Ala 


0 


GAG 


Glu 


17 


GGG 


Gly 


0 



Figure 5F 



Codon Usage: GRverS 



TTT 


Phe 


13 


TCT 


Ser 


16 


TTC 


Phe 


12 


TCC 


Ser 


0 


TTA 


Leu 


0 


TCA 


Ser 


0 


TTG 


Leu 


26 


TCG 


Ser 


0 


CTT 


Leu 


0 


CCT 


Pro 


18 


CTC 


Leu 


5 


CCC 


Pro 


0 


CTA 


Leu 


0 


CCA 


Pro 


10 


CTG 


Leu 


24 


CCG 


Pro 


0 


ATT 


He 


14 


ACT 


Thr 


14 


ATC 


He 


24 


ACC 


Thr 


8 


ATA 


He 


0 


ACA 


Thr 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


GTT 


Val 


1 


GCT 


Ala 


18 


GTC 


Val 


22 


GCC 


Ala 


18 


GTA 


Val 


0 


GCA 


Ala 


1 


GTG 


Val 


27 


GCG 


Ala 


0 



TAT 


Tyr 


9 


TGT 


Cys 


7 


TAC 


Tyr 


10 


TGC 


Cys 


4 


TAA 


* * * 


0 


TGA 


* •*■ * 


0 


TAG 


* * * 


0 


TGG 


Trp 


2 


CAT 


His 


6 


CGT 


Arg 


14 


CAC 


His 


7 


CGC 


Arg 


12 


CAA 


Gin 


9 


CGA 


Arg 


0 


CAG 


Gin 


5 


CGG 


Arg 


0 


AAT 


Asn 


11 


AGT 


Ser 


0 


AAC 


Asn 


11 


AGC 


Ser 


15 


AAA 


Lys 


21 


AGA 


Arg 


0 


AAG 


Lys 


14 


AGG 


Arg 


0 


GAT 


Asp 


12 


GGT 


Gly 


18 


GAC 


Asp 


14 


GGC 


Gly 


21 


GAA 


Glu 


20 


GGA 


Gly 


0 


GAG 


Glu 


18 


GGG 


Gly 


0 



Figure 5G 



Codon Usage : RDverS 



TTT 


Phe 


13 


TCT 


SeiT 


14 


TTC 


Phe 


12 


TCC 


Ser 


1 


TTA 


Leu 


0 


TCA 


Ser 


0 


TTG 


Leu 


27 


TCG 


Seir 


0 


CTT 


Leu 


0 


CCT 


Piro 


16 




Leu 


D 


\^\^\^ 


IT i \J 


r\ 

KJ 


CTA 


Leu 


0 


CCA 


Pro 


12 


CTG 


Leu 


22 


CCG 


Pro 


0 


ATT 


He 


20 


ACT 


Thr 


10 


ATC 


He 


19 


ACC 


Thr 


12 


ATA 


He 


0 


ACA 


Thr 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


GTT 


Val 


0 


GCT 


Ala 


20 


GTC 


Val 


27 


GCC 


Ala 


16 


GTA 


Val 


0 


GCA 


Ala 


1 


GTG 


Val 


22 


GCG 


Ala 


0 



TAT 


Tvr 


7 


TGT 


Cys 


6 


TAG 


Tyr 


13 


TGC 


Cys 


5 


TAA 


*** 


0 


TGA 


* ** 


0 


TAG 


*** 


0 


TGG 


TrD 


2 


CAT 


His 


10 


CGT 




16 


CAC 


His 


3 


CGC 


Atq 


10 


CAA 


Gin 


8 


CGA 


Arg 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


AAT 


Asn 


10 


AGT 


Ser 


0 


AAC 


Asn 


11 


AGC 


Ser 


15 


AAA 


Lys 


13 


AGA 


Arg 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GAT 


Asp 


14 


GGT 


Gly 


16 


GAC 


Asp 


12 


GGC 


Gly 


23 


GAA 


Glu 


18 


GGA 


Gly 


0 


GAG 


Glu 


20 


GGG 


Gly 


0 



Figure 5H 



Codon Usage: GRver4 



TTT 


Phe 


11 


TOT 


Ser 


13 


TTC 


Phe 


14 


TCC 


Ser 


2 


TTA 


Leu 


0 


TCA 


Ser 


1 


TTG 


Leu 


21 


TCG 


Ser 


0 


CTT 


Leu 


1 


OCT 


Pro 


18 


CTC 


Leu 


11 


CCC 


Pro 


0 


CTA 


Leu 


0 


CCA 


Pro 


10 


CTG 


Leu 


22 


CCG 


Pro 


0 


ATT 


He 


13 


ACT 


Thr 


14 


ATC 


He 


25 


ACC 


Thr 


8 


ATA 


He 


0 


ACA 


Thr 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


GTT 


Val 


3 


GCT 


Ala 


19 


GTC 


Val 


22 


GCC 


Ala 


15 


GTA 


Val 


0 


GCA 


Ala 


3 


GTG 


Val 


25 


GCG 


Ala 


0 



TAT 


Tyr 


7 


TGT 


Cys 


8 


TAG 


Tyr 


12 


TGC 


Cys 


3 


TAA 


*** 


0 


TGA 


*** 


0 


TAG 


* * * 


0 


TGG 


Trp 


2 


CAT 


His 


7 


CGT 


Arg 


14 


CAC 


His 


6 


CGC 


Arg 


11 


CAA 


Gin 


11 


CGA 


Arg 


1 


CAG 


Gin 


3 


CGG 


Arg 


0 


AAT 


Asn 


11 


AGT 


Ser 


1 


AAC 


Asn 


11 


AGC 


Ser 


14 


AAA 


Lys 


20 


AGA 


Arg 


0 


AAG 


Lys 


15 


AGG 


Arg 


0 


GAT 


Asp 


12 


GGT 


Gly 


17 


GAC 


Asp 


14 


GGC 


Gly 


19 


GAA 


Glu 


20 


GGA 


Gly 


3 


GAG 


Glu 


18 


GGG 


Gly 


0 



Figure 51 



Codon Usage : RDver4 



TTT 


Phe 


13 


TCT 


Ser 


11 


TAT 


Tvr 


7 


TGT 


Cys 


7 


TTC 


Phe 


12 


TCC 


Ser 


2 


TAC 


Tyr 


13 


TGC 


Cys 


4 


TTA 


Leu 


0 


TCA 


Ser 


2 


TAA 


** * 


0 


TGA 


* ** 


0 


TTG 


Leu 


28 


TCG 


Ser 


0 


TAG 


** * 


0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


16 


CAT 


His 


11 


CGT 


Arq 


15 


CTC 


Leu 


7 


CCC 


Pro 


2 


CAC 


His 


2 


CGC 


Ara 


11 


CTA 


Leu 


0 


CCA 


Pro 


10 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


20 


CCG 


Pro 


0 


CAG 


Gin 


8 


CGG 


Arg 


0 


ATT 


He 


21 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 


1 


ATC 


He 


18 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


13 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GTT 


Val 


3 


GCT 


Ala 


22 


GAT 


Asp 


15 


GGT 


Gly 


14 


GTC 


Val 


27 


GCC 


Ala 


11 


GAC 


Asp 


11 


GGC 


Gly 


21 


GTA 


Val 


0 


GCA 


Ala 


4 


GAA 


Glu 


18 


GGA 


Gly 


4 


GTG 


Val 


19 


GCG 


Ala 


0 


GAG 


Glu 


20 


GGG 


Gly 


0 



Figure 5J 



Codon Usage: GRverS 



TTT 


Phe 


10 


TCT 


Ser 


11 


TAT 


Tvr 


7 


TGT 


Cvs 


8 


TTC 


Phe 


15 


TCC 


Ser 


4 


TAC 


Tyr 


12 


TGC 


Cys 


3 


TTA 


Leu 


0 


TCA 


Ser 


1 


TAA 


** * 


0 


TGA 


*** 


0 


TTG 


Leu 


23 


TCG 


Ser 


0 


TAG 


** * 


0 


TGG 


Trn 


2 


CTT 


Leu 


1 


CCT 


Pro 


17 


CAT 


His 


6 


CGT 


Aro 


13 


CTC 


Leu 


12 


CCC 


Pro 


2 


CAC 


His 


7 


CGC 


Arg 


11 


CTA 


Leu 


0 


CCA 


Pro 


9 


CAA 


Gin 


11 


CGA 


Arg 


2 


CTG 


Leu 


19 


CCG 


Pro 


0 


CAG 


Gin 


3 


CGG 


Arg 


0 


ATT 


He 


15 


ACT 


Thr 


14 


AAT 


Asn 


9 


AGT 


Ser 


1 


ATC 


He 


23 


ACC 


Thr 


8 


AAC 


Asn 


13 


AGC 


Ser 


14 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


19 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


16 


AGG 


Arg 


0 


GTT 


Val 


3 


GCT 


Ala 


18 


GAT 


Asp 


12 


GGT 


Gly 


16 


GTC 


Val 


21 


GCC 


Ala 


14 


GAC 


Asp 


14 


GGC 


Gly 


21 


GTA 


Val 


1 


GCA 


Ala 


5 


GAA 


Glu 


19 


GGA 


Gly 


1 


GTG 


Val 


25 


GCG 


Ala 


0 


GAG 


Glu 


19 


GGG 


Gly 


1 



Figure 5K 



Codon Usage: RDverS 



TTT 


Phe 


13 


TCT 


Ser 


12 


TAT 


Tyr 


7 


TGT 


Cys 


7 


II HI 1/1 




Xz 




Ser 


z 


TAC 


Tyr 






Cys 


4 


TTA 


Leu 


0 


TCA 


Ser 


2 


TAA 


* * * 


0 


TGA 


** * 


0 


TTG 


Leu 


25 


TCG 


Ser 


0 


TAG 


* ** 


0 


TGG 


Trp 


2 


CTT 


Leu 


1 


CCT 


Pro 


15 


CAT 


His 


9 


CGT 


Arg 


14 


CTC 


Leu 


11 


CCC 


Pro 


1 


CAC 


His 


4 


CGC 


Arg 


12 


CTA 


Leu 


0 


CCA 


Pro 


12 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


18 


CCG 


Pro 


0 


CAG 


Gin 


8 


CGG 


Arg 


0 


ATT 


He 


19 


ACT 


Thr 


10 


AAT 


Asn 


9 


AGT 


Ser 


2 


ATC 


He 


20 


ACC 


Thr 


11 


AAC 


Asn 


12 


AGC 


Ser 


12 


ATA 


He 


0 


ACA 


Thr 


1 


AAA 


Lys 


13 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GTT 


Val 


5 


GCT 


Ala 


21 


GAT 


Asp 


14 


GGT 


Gly 


14 


GTC 


Val 


26 


GCC 


Ala 


12 


GAC 


Asp 


12 


GGC 


Gly 


21 


GTA 


Val 


1 


GCA 


Ala 


4 


GAA 


Glu 


18 


GGA 


Gly 


3 


GTG 


Val 


17 


GCG 


Ala 


0 


GAG 


Glu 


20 


GGG 


Gly 


1 



Figure 6 



Synthetic oligos for engineered GR/RD genes 
(All oligos listed 5 'to 3') 

Coding strand: 5' ( )n 3' 

Non-coding strand: 3' ( )n _5' 

Oligos with pRAM flanking sequence identical for GR/RD 

1) coding strand upstream flanking 

RAM-Cl: ACGCCAGCCCAAGCTTAGGCCTGAGTGGC (SEQ ID NO: 35) 

RAM-C2: CTTAATTCTCCCCATCCCCCTGTTGACAATTAATCATCGGCTCG (SEQ ID NO: 36) 
RAM-C3: TATAATGTGAGGAATTGCGAGCGGATAACAATTTCACACA (SEQ ID NO: 37) 



2) coding strand downstream flanking 



RAM- 


■C4: 


ATGGGATGTTACCTAGACCAATATGAAATATTTGGTAAAT 


(SEQ 


ID 


NO 


:38) 


RAM- 


•05: 


AAATGCTTAATGAATTTCAAAAAAAAAAAAAAAGGAATTC 


(SEQ 


ID 


NO 


:39) 


RAM- 


■C6: 


GATATCAAGCTTATCGATACCGTCGACCTCGAGGATTATA 


(SEQ 


ID 


NO 


:40) 


RAM- 


■07: 


TAGAAAAAGGCCTCGGCGGCCGCTAGTTCAGTCAGTT 


(SEQ 


ID 


NO 


:41) 


3 ) non- 


coding strand downstream flanking 










RAM- 


-Ml 


: AACTGACTGAACTAGCG 


(SEQ 


ID 


NO 


:42) 


RAM- 


-N2 


: GCCGCCGAGGCCTTTTTCTATATAATCCTCGAGGTCGACG 


(SEQ 


ID 


NO 


:43) 


RAM- 


-N3 


: GTATCGATAAGCTTGATATCGAATTCCTTTTTTTTTTTTT 


(SEQ 


ID 


NO 


:44) 


RAM- 


-N3b:AGCTTGATATCGAATTCCTTTTTTTTTTTTTTTGAAATTC 


(SEQ 


ID 


NO 


:45} 


RAM- 


-N4 


: TTGAAATTCATTAAGCATTTATTTACCAAATATTTCATAT 


(SEQ 


ID 


NO 


:46) 


RAM- 


-N5 


: TGGTCTAGGTAACATCCCATCACTAGCTTTTTTTTCTATA 


(SEQ 


ID 


NO 


:47) 



4) non- coding strand upstream flanking 

RAM-N6: TCGCAATTCCTCACATTATACGAGCCGATGATTAATTGTC (SEQ ID NO: 48) 
RAM-N7 : AACAGGGGGATGGGGAGAATTAAGGCCACTCAGGCCTAAGCTTGGGCTGGCGT 

(SEQ ID NO: 49) 

GRverS with flanking seq. of pRAM to end of Sfi 1 primers 

1) Coding strand (Start and stop codons are underlined) 



GR- 


CI: 


GGAAACAGGATCCCATGATGAAACGCGAAAAGAACGTGAT 


(SEQ 


ID 


NO 


:50) 


GR- 


C2 : 


CTACGGCCCAGAACCACTGCATCCACTGGAAGACCTCACC 


(SEQ 


ID 


NO 


:51) 


GR- 


C3: 


GCTGGTGAGATGCTCTTCCGAGCACTGCGTAAACATAGTC 


(SEQ 


ID 


NO 


:52) 


GR- 


C4: 


ACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAG 


(SEQ 


ID 


NO 


:53) 


GR- 


C5: 


CCTCTCCTACAAAGAATTTTTCGAAGCTACTGTGCTGTTG 


(SEQ 


ID 


NO 


:54) 


GR- 


C6: 


GCCCAAAGCCTCCATAATTGTGGGTACAAAATGAACGATG 


(SEQ 


ID 


NO 


:55) 


GR- 


C7: 


TGGTGAGCATTTGTGCTGAGAATAACACTCGCTTCTTTAT 


(SEQ 


ID 


NO 


:56) 


GR- 


C8: 


TCCTGTAATCGCTGCTTGGTACATCGGCATGATTGTCGCC 


(SEQ 


ID 


NO 


:57) 


GR- 


C9: 


CCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGG 


(SEQ 


ID 


NO 


:58) 


GR- 


■CIO 


: TTATGGGTATTAGCAAACCTCAAATCGTCTTTACTACCAA 


(SEQ 


ID 


NO 


:59) 


GR- 


■Cll 


: AAACATCTTGAATAAGGTCTTGGAAGTCCAGTCTCGTACT 


(SEQ 


ID 


NO 


:60) 


GR- 


■C12 


:AACTTCATCAAACGCATCATTATTCTGGATACCGTCGAAA 


(SEQ 


ID 


NO 


:61) 


GR- 


-C13 


:ACATCCACGGCTGTGAGAGCCTCCCTAACTTCATCTCTCG 


(SEQ 


ID 


NO 


:62) 


GR- 


•C14 


:TTACAGCGATGGTAATATCGCTAATTTCAAGCCCTTGCAT 


(SEQ 


ID 


NO 


:63) 


GR- 


-C15 


:TTTGATCCAGTCGAGCAAGTGGCCGCTATTTTGTGCTCCT 


(SEQ 


ID 


NO 


:64) 


GR- 


•C16 


rCCGGCACCACTGGTTTGCCTAAAGGTGTCATGCAGACTCA 


(SEQ 


ID 


NO 


:65) 


GR- 


-C17 


tCCAGAATATCTGTGTGCGTTTGATCCACGCTCTCGACCCT 


(SEQ 


ID 


NO 


:66) 


GR- 


-C18 


:CGTGTGGGTACTCAATTGATCCCTGGCGTGACTGTGCTGG 


(SEQ 


ID 


NO 


:67) 


GR- 


•C19 : TGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTAC 


(SEQ 


ID 


NO 


:68) 


GR- 


-C20 


iCCTGGGCTATTTCATGGTCGGCTTGCGTGTCATCATGTTT 


(SEQ 


ID 


NO 


:69) 
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GR-C21:CGTCGCTTCGACCAAGAAGCCTTCTTGAAGGCTATTCAAG (SEQ ID NO: 70 

GR-C22 :ACTACGAGGTGCGTTCCGTGATCAACGTCCCTTCAGTCAT (SEQ ID NO: 71 

GR-C23 rTTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAGTATGATCTG (SEQ ID NO: 72 

GR - C2 4 : AGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTT (SEQ ID NO : 7 3 

GR-C25:TGGCCAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAA (SEQ ID NO: 74 

GR-C26:CCTCCCTGGTATCCGCTGCGGTTTTGGTTTGACTGAGAGC (SEQ ID NO: 75 

GR-C27:ACTTCTGCTAACATCCATAGCTTGCGAGACGAGTTTAAGT (SEQ ID NO: 76 

GR-C28:CTGGTAGCCTGGGTCGCGTGACTCCTCTTATGGCTGCAAA (SEQ ID NO: 77 

GR-C29:GATCGCCGACCGTGAGACCGGCAAAGCACTGGGCCCAAAT (SEQ ID NO: 78 

GR-C30:CAAGTCGGTGAATTGTGTATTAAGGGCCCTATGGTCTCTA (SEQ ID NO: 79 

GR-C31:AAGGCTACGTGAACAATGTGGAGGCCACTAAAGAAGCCAT (SEQ ID NO: 80 

GR-C32 :TGATGATGATGGCTGGCTCCATAGCGGCGACTTCGGTTAC (SEQ ID NO: 81 

GR-C33 :TATGATGAGGACGAACACTTCTATGTGGTCGATCGCTACA (SEQ ID NO: 82 

GR-C34 :AAGAATTGATTAAGTACAAAGGCTCTCAAGTCGCACCAGC (SEQ ID NO: 83 

GR-C35:CGAACTGGAAGAAATTTTGCTGAAGAACCCTTGTATCCGC (SEQ ID NO: 84 

GR - C3 6 : GACGTGGCCGTCGTGGGTATCCCAGACTTGGAAGCTGGCG (SEQ ID NO : 8 5 

GR-C37:AGTTGCCTAGCGCCTTTGTGGTGAAACAACCCGGCAAGGA (SEQ ID NO: 86 

GR-C38:GATCACTGCTAAGGAGGTCTACGACTATTTGGCCGAGCGC (SEQ ID NO: 87 

GR-C3 9:GTGTCTCACACCAAATATCTGCGTGGCGGCGTCCGCTTCG (SEQ ID NO:88 

GR-C40:TCGATTCTATTCCACGCAACGTTACCGGTAAGATCACTCG (SEQ ID NO: 89 

GR-C41:TAAAGAGTTGCTGAAGCAACTCCTCGAAAAAGCTGGCGGC (SEQ ID NO: 90 

GR-C42 :TAGTAAAGTCTTCATGATTATATAGAAAAAAAAGCTAGTG (SEQ ID NO: 91 

2) non-coding strand 

GR-Nl: TAATCATGAAGACTTTACTAGCCGCCAGCTTTTTCGAGGA (SEQ ID NO: 92) 

GR-N2: GTTGCTTCAGCAACTCTTTACGAGTGATCTTACCGGTAAC (SEQ ID NO: 93) 

GR-N3: GTTGCGTGGAATAGAATCGACGAAGCGGACGCCGCCACG (SEQ ID NO: 94) 

GR-N4: CAGATATTTGGTGTGAGACACGCGCTCGGCCAAATAGTCGT (SEQ ID NO: 95) 

GR-N5: AGACCTCCTTAGCAGTGATCTCCTTGCCGGGTTGTTTCAC (SEQ ID NO: 96) 

GR-N6: CACAAAGGCGCTAGGCAACTCGCCAGCTTCCAAGTCTGGG (SEQ ID NO: 97) 

GR-N7: ATACCCACGACGGCCACGTCGCGGATACAAGGGTTCTTC:A (SEQ ID NO: 98) 

GR-N8: GCAAAATTTCTTCCAGTTCGGCTGGTGCGACTTGAGAGCC (SEQ ID NO: 99) 

GR-N9: TTTGTACTTAATCAATTCTTTGTAGCGATCGACCACATAG (SEQ ID NO: 100) 

GR-N10:AAGTGTTCGTCCTCATCATAGTAACCGAAGTCGCCGCTAT (SEQ ID NO: 101) 

GR-N11:GGAGCCAGCCATCATCATCAATGGCTTCTTTAGTGGCCTC (SEQ ID NO: 102) 

GR-N12:CACATTGTTCACGTAGCCTTTAGAGACCATAGGGCCCTTA (SEQ ID NO: 103) 

GR-N13 :ATACACAATTCACCGACTTGATTTGGGCCCAGTGCTTTGC (SEQ ID NO: 104) 

GRr-N14:CGGTCTCACGGTCGGCGATCTTTGCAGCCATAAGAGGAGT (SEQ ID NO: 105) 

GR-N15:CACGCGACCCAGGCTACCAGACTTAAACTCGTCTCGCAAG (SEQ ID NO: 106) 

GR-N16:CTATGGATGTTAGCAGAAGTGCTCTCAGTCAAACCAAAAC (SEQ ID NO: 107) 

GR-N17:CGCAGCGGATACCAGGGAGGTTC:AGACGCTTAGCAGCGAC (SEQ ID NO: 108) 

GR-N18 : CTCGGCCACTTCTTTGGCCAAAGGAGCAGCGCCACAGCAC (SEQ ID NO : 109) 

GR-N19:AGCTCACGCAAGCTGCTCAGATCATACTTGTCAACCAAAG (SEQ ID NO: 110) 

GR-N20:GAGATTTGCTCAGGAACAAAATGACTGAAGGGACGTTGAT (SEQ ID NO: 111) 

GR-N21:CACGGAACGCACCTCGTAGTCTTGAATAGCCTTCAA (SEQ ID NO: 112) 
GR-N22:GAAGGCTTCTTGGTCGAAGCGACGAAACATGATGACACGCAAGC (SEQ ID NO: 113) 

GR-N23:CGACCATGAAATAGCCCAGGGTAATAGAGAAACCAAAGGC (SEQ ID NO: 114) 

GR-N24:GTGAAAGAAAGGCAGATACACCAGCACAGTCACGCCAGGG (SEQ ID NO: 115) 

GR-N25:ATCAATTGAGTACCCACACGAGGGTCGAGAGCGTGGATCA (SEQ ID NO: 116) 

GR-N26:AACGCACACAGATATTCTGGTGAGTCTGCATGACACCTTT (SEQ ID NO: 117) 

GR-N27:AGGCAAACCAGTGGTGCCGGAGGAGCACAAAATAGCGGCC (SEQ ID NO: 118) 
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GR-N28:ACTTGCTCGACTGGATCAAAATGCAAGGGCTTGAAATTAG (SEQ ID MO: 119) 

GR-N29:CGATATTACCATCGCTGTAACGAGAGATGAAGTTAGGGAG (SEQ ID NO: 12 0) 

GR-N30:GCTCTCACAGCCGTGGATGTTTTCGACGGTATCCAGAATA (SEQ ID NO: 121) 

GR-N31:ATGATGCGTTTGATGAAGTTAGTACGAGACTGGACTTCCA (SEQ ID NO: 12 2) 

GR-N32 :AGACCTTATTCAAGATGTTTTTGGTAGTAAAGACGATTTG (SEQ ID NO: 123) 

GR-N33 :AGGTTTGCTAATACCCATAACCTTACACAGCTCATCTGGG (SEQ ID NO: 124) 

GR-N34:ATGTAAGATTCATTCACAGGGGCGACAATCATGCCGATGT (SEQ ID NO: 12 5) 

GR-N35:ACCAAGCAGCGATTACAGGAATAAAGAAGCGAGTGTTATT (SEQ ID NO: 12 6) 

GR-N36:CTCAGCACAAATGCTCACCACATCGTTCATTTTGTACCCA (SEQ ID NO: 12 7) 

GR-N37 :CAATTATGGAGGCTTTGGGCCAACAGCACAGTAGCTTCGA (SEQ ID NO: 12 8) 

GR-N38 :AAAATTCTTTGTAGGAGAGGCTCTCGTCTCCCACGACGTC (SEQ ID NO: 12 9) 

GR-N39:CACGAGTGCTTGAGGGAGGTGACTATGTTTACGCAGTGCT (SEQ ID NO: 130) 

GR-N40 rCGGAAGAGCATCTCACCAGCGGTGAGGTCTTCCAGTGGAT (SEQ ID NO: 131) 

GR-N41:GCAGTGGTTCTGGGCCGTAGATCACGTTCTTTTCGCGTTT (SEQ ID NO: 132) 

GR-N42 :CATCATGGGATCCTGTTTCCTGTGTGAAATTGTTATCCGC (SEQ ID NO:133) 

KDverB with flanking sequence of pRAM to end of Sfi I primers 

1) coding strand 

RD-Cl: GGAAACAGGATCCCATGATGAAGCGTGAGAAAAATGTCAT (SEQ ID NO: 134) 

RD-C2: CTATGGCCCTGAGCCTCTCCATCCTTTGGAGGATTTGACT (SEQ ID NO: 135) 

RD-C3: GCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGCACTCTC (SEQ ID NO: 13 6) 

RD-C4: ATTTGCCTCAAGCCTTGGTCGATGTGGTCGGGGATGAATC (SEQ ID NO: 137) 

RD-C5: TTTGAGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTG (SEQ ID NO: 138) 

RD-C6: GCTCAGTCCCTCCACAATTGTGGCTACAAGATGAACGACG (SEQ ID NO: 13 9) 

RD-C7: TCGTTAGTATCTGTGCTGAAAACAATACCCGTTTCTTCAT (SEQ ID NO: 140) 

RD-C8: TCCAGTCATCGCCGCATGGTATATCGGTATGATCGTGGCT (SEQ ID NO: 141) 

RD-C9: CCAGTCAACGAGAGCTACATTCCCGACGAACTGTGT7VAAG (SEQ ID NO: 142) 

RD-C10:TCATGGGTATCTCTAAGCCACAGATTGTCTTCACCACTAA (SEQ ID NO: 143) 

RD-C11:GAATATTCTGAACAAAGTCCTGGAAGTCCAAAGCCGCACC (SEQ ID NO: 144) 

RD-C12 :AACTTTATTAAGCGTATCATCATCTTGGACACTGTGGAGA (SEQ ID NO: 145) 

RD-C13 : ATATTCACGGTTGCGAATCTTTGCCTAATTTCATCTCTCG ( SEQ ID NO : 14 6 ) 

RD-C14:CTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCAC (SEQ ID NO: 147) 

RD - C 1 5 : TTCGACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCA (SEQ ID NO : 14 8 ) 

RD-C16:GCGGTACTACTGGACTCCCAAAGGGAGTCATGCAGACCCA (SEQ ID NO: 149) 

RD-C17:TCAAAACATTTGCGTGCGTCTGATCCATGCTCTCGATCCA (SEQ ID NO: 150) 

RD-C18 : CGCTACGGCACTCAGCTGATTCCTGGTGTCACCGTCTTGG (SEQ ID NO : 151) 

RD-C19 : TCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTAC (SEQ ID NO : 152 ) 

RC-C20 :TTTGGGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTC (SEQ ID NO: 153) 

RD-C21:CGCCGTTTTGATCAGGAGGCTTTCTTGAAAGCCATCCAAG (SEQ ID NO: 154) 
RD-C22 :ATTATGAAGTCCGCAGTGTCATCAACGTGCCTAGCGTGAT (SEQ ID NO: 155) 

RD-C23 :CCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAGTACGAC (SEQ ID NO: 156) 

RD - C2 4 : TTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCAC (SEQ ID N0:157) 

RD-C25 :TGGCTAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAA (SEQ ID NO: 158) 

RD-C26:TCTTCCAGGGATTCGTTGTGGCTTCGGCCTCACCGAATCT (SEQ ID NO: 159) 

RD-C2 7:ACCAGCGCTATTATTCAGTCTCTCCGCGATGAGTTTAAGA (SEQ ID NO: 160) 

RD-C28 :GCGGCTCTTTGGGCCGTGTCACTCCACTCATGGCTGCTAA (SEQ ID NO: 161) 

RD-C29:GATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCTAAC (SEQ ID NO: 162) 

RD-C30 :CAAGTGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCA (SEQ ID NO: 163) 

RD-C31:AGGGTTATGTCAATAACGTCGAAGCTACCAAGGAGGCCAT (SEQ ID NO: 164) 

RD - C3 2 : CGACGACGACGGCTGGTTGCATTCTGGTGATTTTGGATAT (SEQ ID NO : 1 6 5 ) 

RD-C33 :TACGACGAAGATGAGCATTTTTACGTCGTGGATCGTTACA (SEQ ID NO: 166) 

RD-C34:AGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGC (SEQ ID NO: 167) 

RD-C35:TGAGTTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGC (SEQ ID NO: 16 8) 
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RD-C36:GATGTCGCTGTGGTCGGCATTCCTGATCTGGAGGCCGGCG (SEQ ID NO: 169) 

RD-C37 : AACTGCCTTCTGCTTTCGTTGTCAAGCAGCCTGGTAAAGA (SEQ ID NO : 170 ) 

RD-C38:AATTACCGCCAAAGAAGTGTATGATTACCTGGCTGAACGT (SEQ ID NO: 171) 

RD - C3 9 : GTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTG (SEQ ID NO : 1 7 2 ) 

RD-C4 0 :TTGACTCCATCCCTCGTAACGTAACAGGCAAAATTACCCG (SEQ ID NO: 173) 

RD-C41:CAAGGAGCTGTTGAAACAATTGTTGGAGAAGGCCGGCGGT (SEQ ID NO: 174) 

RD-C42 ; TAGTAA AGTCTTCATGATTATATAGAAAAAAAAGCTAGTG (SEQ ID NO: 175) 

2) non-coding strand 

RD-Nl: TAATCATGAAGACTTTACTAACCGCCGGCCTTCTCCAACA (SEQ ID NO: 176) 
RD-N2: ATTGTTTCAACAGCTCCTTGCGGGTAATTTTGCCTGTTAC (SEQ ID NO: 177) 
RD-N3: GTTACGAGGGATGGAGTCAACAAAACGCACGCCGCCACGC (SEQ ID NO: 178) 
RD-N4: AAGTACTTAGTATGGCTCAGACGTTCAGCCAGGTAATCAT (SEQ ID NO: 179) 
RD-N5: ACACTTCTTTGGCGGTAATTTCTTTACCAGGCTGCTTGAC (SEQ ID NO: 180) 
RD-N6: AACGAAAGCAGAAGGCAGTTCGCCGGCCTCCAGATCAGGA (SEQ ID NO: 181) 
RD-N7: ATGCCGACCACAGCGACATCGCGAATGCATGGATTTTTCA (SEQ ID NO: 182) 
RD-N8: ACAGAATCTCCTCCAACTCAGCTGGAGCAACCTGGCTACC (SEQ ID NO: 183) 
RD-N9: CTTGTATTTGATCAGCTCCTTGTAACGATCCACGACGTAA (SEQ ID NO: 184) 
RD-N10:AAATGCTCATCTTCGTCGTAATATCCAAAATCACCAGAAT (SEQ ID NO: 185) 
RD-N11:GCAACCAGCCGTCGTCGTCGATGGCCTCCTTGGTAGCTTC (SEQ ID NO: 186) 
RD-N12:GACGTTATTGACATAACCCTTGCTCACCATAGGGCCTTTG (SEQ ID NO: 187) 
RD-N13 :ATACACAGCTCGCCCACTTGGTTAGGGCCCAAAGCCTTAC (SEQ ID NO: 188) 
RD-N14:CAGTTTCGCGATCAGCGATCTTAGCAGCCATGAGTGGAGT (SEQ ID NO: 189) 
RD-N15:GACACGGCCCAAAGAGCCGCTCTTAAACTCATCGCGGAGA (SEQ ID NO: 190) 
RD-N16:GACTGAATAATAGCGCTGGTAGATTCGGTGAGGCCGA (SEQ ID NO: 191) 
RD-N17 : AGCCACAACGAATCCCTGGAAGATTCAAGCGTTTGGCGGCCAC ( SEQ ID NO : 192 ) 
RD-N18:TTCAGCGACCTCCTTAGCCAGTGGAGCGGCACCGCAACAC (SEQ ID NO: 193) 
RD-N19:AATTCACGCAGTGAAGACAAGTCGTACTTGTCCACGAGTG (SEQ ID NO: 194) 
RD-N2 0:GGCTCTTAGACAAAAACAGGATCACGCTAGGCACGTTGAT (SEQ ID NO: 195) 
RD-N21:GACACTGCGGACTTCATAATCTTGGATGGCTTTCAAGAAA (SEQ ID NO: 196) 
RD-N22 :GCCTCCTGATCAAAACGGCGGAACATAATCACGCGGAGAC (SEQ ID NO: 197) 
RD-N23 :CGACCATAAAGTAACCCAAAGTAATATGAAAGCCGAAAGC (SEQ ID NO: 198) 
RD-N24:ATGGAAGAAAGGCAAGTAGACCAAGACGGTGACACCAGGA (SEQ ID NO: 199) 
RD-N25:ATCAGCTGAGTGCCGTAGCGTGGATCGAGAGCATGGATCA (SEQ ID NO: 2 00) 
RD-N26:GACGCACGCAAATGTTTTGATGGGTCTGCATGACTCCCTT (SEQ ID NO: 2 01) 
RD-N27 :TGGGAGTCCAGTAGTACCGCTGCTACACAGAATGGCTGCA (SEQ ID NO: 2 02) 
RD-N28 :ACTTGTTCCACAGGGTCGAAGTGGAGTGGTTTAAAGTTTG (SEQ ID NO: 203) 
RD-N29:CGATGTTGCCGTCTGAATAGCGAGAGATGAAATTAGGCAA (SEQ ID NO: 2 04) 
RD-N30:AGATTCGCAACCGTGAATATTCTCCACAGTGTCCAAGATG (SEQ ID NO: 205) 
RD-N31:ATGATACGCTTAATAAAGTTGGTGCGGCTTTGGACTTCCA (SEQ ID NO: 2 06) 
RD -N3 2 : GGACTTTGTTCAGAATATTCTTAGTGGTGAAGACAATCTG (SEQ ID NO:207) 
RD-N33 tTGGCTTAGAGATACCCATGACTTTACACAGTTCGTCGGGA (SEQ ID NO: 208) 
RD-N34 :ATGTAGCTCTCGTTGACTGGAGCCACGATCATACCGATAT (SEQ ID NO: 209) 
RD-N35:ACCATGCGGCGATGACTGGAATGAAGAAACGGGTATTGTT (SEQ ID NO: 2 10) 
RD-N3 6:TTCAGCACAGATACTAACGACGTCGTTCATCTTGTAGCCA (SEQ ID NO: 211) 
RD-N37:CAATTGTGGAGGGACTGAGCCAGCAAGACGGTTGCCTCAA (SEQ ID NO: 2 12) 
RD-N3 8 :AAAACTCCTTGTAGCTCAAAGATTCATCGCCGACCACATC (SEQ ID NO: 2 13) 
RD-N3 9:GACCAAGGCTTGAGGCAAATGAGAGTGCTTGCGGAGAGCA (SEQ ID NO: 2 14) 
RD-N40 :CGAAACAGCATTTCGCCGGCAGTCAAATCCTCCAAAGGAT (SEQ ID NO: 215) 
RD -N4 1 : GGAGAGGCTCAGGGCCATAGATGACATTTTTCTCACGCTT (SEQ ID NO : 2 1 6 ) 
RD -N4 2 : CATCAT GGGATCCTGTTTCCTGTGTGAAATTGTTATCCGC ( SEQ ID NO : 2 1 7 ) 
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mLLUC.SEQ AYLEPFKEKGEVRRPTLSWPREIPLVKGGKPDVVQIVRNY 
^LUCVERl .SEQAYLEPFKEKGEVRRPTLSWPREIPLVKGGKPDVVQIVRNY 
iflLUCVER2 . SEQ AYLEPFKEKGEVRRPTLSWPREIPLVKGGKPDVVQIVRNY 
1e4ucfinl.seqa YLEPFKEKGEVRRPTLSWPREIPLVKGGKPDVVQIVRNY 

llfeLLUC. SEQ NAYLRASDDLPKMFIESDPGFFSNAIVEGAKKFPNTEFVK 
^iRLUCVERl .SEQNAYLRASDDLPKMFIESDPGFFSNAIVEGAKKFPNTEFVK 
rBLUCVER2.SEQN AYLRASDDLPKMFIESDPGFFSNAIVEGAKKFPNTEFVK 
DiLUCFINL.SEQN AYLRASDDLPKMFIESDPGFFSNAIVEGAKKFPNTEFVK 

JELLUC. SEQ VKGLHFSQEDAPDEMGKYIKSFVERVLKNEQ 
Slucveri .SEQVKGLHFSQEDAPDEMGKYIKSFVERVLKNEQ 
yi^LUCVER2 . SEQ V KGLHFSQEDAPDEMGKYIKSFVERVLKNEQ 
"iClUCFINL. SEQ VKGLHFSQEDAPDEMGKYIKSFVERVLKNEQ 
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Figure 9A 

Codon usage in RELLUC 

(Renilla reniformis; Genbank ACCESSION:M63501; Medline:91239583) 
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Figure 9B 

Codon Usage in Rluc-fmal 
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Figure 10 

Oligonucleotides for the assembly of synthetic Renilla luciferase gene 



Sense Strand 
Oligo name 

RLSl (1-40) 
RLS2 (41-80) 
RLS3 (81-120) 
RLS4 (121-170) 

RLS5 (171-210) 
RLS6 (211-250) 
RLS7 (251-290) 
RLS8 (291-330) 
RLS9 (331-370) 
RLSIO (371-410) 
RLSl 1(411-450) 
RLSI2 (451-495) 
RLSl 3 (496-535) 
RLSl 4 (536-575) 
RLS 15 (576-620) 
RLS16 (621-660) 
RLS17 (661-700) 
RLS18 (701-740) 
RLS19 (741-780) 
RLS20 (781-820) 
RLS2I (821-860) 
RLS22 (861-900) 
RLS23 (901-949) 

Anti-sense Strand 

Oligo name 
RLASl (1-29) 
RLAS2 (30-69) 
RLAS3 (70-109) 
RLAS4 (1 10-149) 
RLAS5 (150-189) 
RLAS6 (190-229) 
RLAS7 (230-269) 
RLASS (270-309) 
RLAS9 (310-349) 
RLASl 0(350-394) 
RLASl 1 (395-434) 
RLAS 12 (435-474) 
RLAS13 (475-517) 
RLAS14 (518-559) 
RLAS 15 (560-599) 
RLAS 16 (600-639) 
RLAS 17 (640-679) 
RLAS 1 8 (680-719) 
RLAS 19 (720-764) 
RLAS20 (765-804) 
RLAS21 (805-849) 
RLAS22 (850-889) 
RLAS23 (890-929) 
RLAS24 (930-949) 



Oligo sequence from 5' to 3' 

AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA (SEQ ID NO:246) 

CGCATGATCACTGGGCCTCAGTGGTGGGCTCGCTGCAAGC (SEQ ID NO:247) 

AAATGAACGTGCTGGACTCCTTCATCAACTACTATGATTC (SEQ ID NO;248) 
CGAGAAGCACGCCGAGAACGCCGTGATTTTTCTGCATGGTAACGCTGCCT 

(SEQ ID NO:249) 

CCAGCTACCTGTGGAGGCACGTCGTGCCTCACATCGAGCC (SEQ ID NO:250) 

CGTGGCTAGATGCATCATCCCTGATCTGATCGGAATGGGT (SEQ ID NO:25 1) 

AAGTCCGGCAAGAGCGGGAATGGCTCATATCGCCTCCTGG (SEQ ID NO:252) 

ATCACTACAAGTACCTCACCGCTTGGTTCGAGCTGCTGAA (SEQ ID NO:253) 

CCTTCCAAAGAAAATCATCTTTGTGGGCCACGACTGGGGG (SEQ ID NO:254) 

GCTTGTCTGGCCTTTCACTACTCCTACGAGCACCAAGACA (SEQ ID NO:255) 

AGATCAAGGCCATCGTCCATGCTGAGAGTGTCGTGGACGT (SEQ ID NO:256) 

GATCGAGTCCTGGGACGAGTGGCCTGACATCGAGGAGGATATCGC (SEQ ID NO:257) 

CCTGATCAAGAGCGAAGAGGGCGAGAAAATGGTGCTTGAG (SEQ ID NO:258) 

AATAACTTCTTCGTCGAGACCATGCTCCCAAGCAAGATCA (SEQ ID NO;259) 

TGCGGAAACTGGAGCCTGAGGAGTTCGCTGCCTACCTGGAGCCAT (SEQ ID NO:260) 

TCAAGGAGAAGGGCGAGGTTAGACGGCCTACCCTCTCCTG (SEQ ID NO:261) 

GCCTCGCGAGATCCCTCTCGTTAAGGGAGGCAAGCCCGAC (SEQ ID NO:262) 

GTCGTCCAGATTGTCCGCAACTACAACGCCTACCTTCGGG (SEQ ID NO:263) 

CCAGCGACGATCTGCCTAAGATGTTCATCGAGTCCGACCC (SEQ ID NO:264) 

TGGGTTCTTTTCCAACGCTATTGTCGAGGGAGCTAAGAAG (SEQ ID NO:265) 

TTCCCTAACACCGAGTTCGTGAAGGTGAAGGGCCTCCACT (SEQ ID NO:266) 

TCAGCCAGGAGGACGCTCCAGATGAAATGGGTAAGTACAT (SEQ ID NO:267) 
CAAGAGCTTCGTGGAGCGCGTGCTGAAGAACGAGCAGTAATTCTAGAGC 

(SEQ ID NO:268) 



Oligo Sequence from 5' to 3' 

GCTCTAGAATTACTGCTCGTTCTTCAGCA (SEQ ID NO:269) 

CGCGCTCCACGAAGCTCTTGATGTACTTACCCATTTCATC (SEQ ID NO:270) 

TGGAGCGTCCTCCTGGCTGAAGTGGAGGCCCTTCACCTTC (SEQ ID NO:271) 

ACGAACTCGGTGTTAGGGAACTTCTTAGCTCCCTCGACAA (SEQ ID NO:272) 

TAGCGTTGGAAAAGAACCCAGGGTCGGACTCGATGAACAT (SEQ ID NO:273) 

CTTAGGCAGATCGTCGCTGGCCCGAAGGTAGGCGTTGTAG (SEQ ID NO:274) 

TTGCGGACAATCTGGACGACGTCGGGCTTGCCTCCCTTAA (SEQ ID NO:275) 

CGAGAGGGATCTCGCGAGGCCAGGAGAGGGTAGGCCGTCT (SEQ ID NO:276) 

AACCTCGCCCTTCTCCTTGAATGGCTCCAGGTAGGCAGCG (SEQ ID NO:277) 

AACTCCTCAGGCTCCAGTTTCCGCATGATCTTGCTTGGGAGCATG (SEQ ID NO:278) 

GTCTCGACGAAGAAGTTATTCTCAAGCACCATTTTCTCGC (SEQ ID NO:279) 

CCTCTTCGCTCTTGATCAGGGCGATATCCTCCTCGATGTC (SEQ ID NO:280) 

AGGCCACTCGTCCCAGGACTCGATCACGTCCACGACACTCTCA (SEQ ID NO:281) 

GCATGGACGATGGCCTTGATCTTGTCTTGGTGCTCGTAGGAG (SEQ ID NO:282) 

TAGTGAAAGGCCAGACAAGCCCCCCAGTCGTGGCCCACAA (SEQ ID NO:283) 

AGATGATTTTCTTTGGAAGGTTCAGCAGCTCGAACCAAGC (SEQ ID NO:284) 

GGTGAGGTACTTGTAGTGATCCAGGAGGCGATATGAGCCA (SEQ ID NO:285 

TTCCCGCTCTTGCCGGACTTACCCATTCCGATCAGATCAG (SEQ ID NO:286) 
GGATGATGCATCTAGCCACGGGCTCGATGTGAGGCACGACGTGCC (SEQ ID NO:287) 

TCCACAGGTAGCTGGAGGCAGCGTTACCATGCAGAAAAAT (SEQ ID NO:288) 

CACGGCGTTCTCGGCGTGCTTCTCGGAATCATAGTAGTTGATGAA (SEQ ID NO:289) 

GGAGTCCAGCACGTTCATTTGCTTGCAGCGAGCCCACCAC (SEQ ID NO:290) 

TGAGGCCCAGTGATCATGCGTTTGCGTTGCTCGGGGTCGT (SEQ ID NO:29}) 

ACACCTTGGAAGCCATGGTT (SEQ ID NO:292) 



Figure 11 

GRVER51 . SEQ A T G A T 6 A a |a c| g[c]g a(a1a AfcjA a[c]g tIgIa T [c] T a[c|g G fc] C cfA|G A A C 40 



LUCPPLYG , SEQ A TGATGAAGAGAGAGAAAAATGTTATATATGGACCCGAAC 40 
RD1561H9.SEQ A T GAt[a1aAg[c]g[t]gAGAAAAAT Gt[c]at[c]tAT Gg[c]cc[t]ga[g]c 40 

GRVER51.SEQ C [a] C T [g] C a[t]c c |a c| t G G A A G A c1c]t[c]a c[c]g c[t]g g[t]g a[g]a T G C T 80 
LUCPPLYG, SEQ CCCTACACCCCTTGGAAGACTTAACAGCAGGAGAAATGCT 80 

rdi561H9.seqc[t]c t jc] c a[t]c c|t]t t g g a[g]g a[t]t t[g]a c[t]g c[c]g g[c]g AAATGCT 80 

GRVER51.SEQ C T T c[c]g[a]g c(a1c t\g\c g[t]a A A C A t |a g| t C a |c c| T |c1 C C [t] C a[a]g C [a] 120 
LUCPPLYG, SEQ C T T C A GGGCCCTTCGAAAACATTCTCATTTACCGCAGGCT 120 

rdi56ih9,seq[g]t t |t c| g|¥1g c[t]c t[c]c g[c1a a[g1c a[c]t c t c a t t t[g]c c[t]c a[a|g c\c\ 120 

GRVER51.SEQ [cjT[c]G t[g]g a[c]g t |c g| t[g1g g[a]g A C G a |g A G c| c T |c] T C C T a[c]a A A G 160 
lucpplyg.seqt TAGTAGATGTGTTTGGTGACGAAT C G c t t t c c t a t a a a G 160 
RD1561H9,SEQT t[g1gt[c]gAT GTg[g]t[c]gg[c]ga[t]gAAT c |t tI t Ig'a g| c Ta|c]aa[g]g 160 

GRVER51.SEQ aIaIt TTTT[c]GAAGCTAC |TGTG |CT|GT|T[G]Gc[c]cAAAGfc|cTCCA 200 
LUCPPLYG.SEQA GTTTTTTGAAGCTA cat G C CTCCTAGCGC a A A G T C T C C A 200 
RD1561H9.SEQA GTTTTTTG a[g]g c\^A C [c^^ C [t] T [g] C T [g] G c\t\c a \g T C c| c TOGA 200 

GRVER51.SEQ jT]AATTGTGG[GlTACAA[AjATGAA[c]GATGT[G]GT G |a G c|a T [t] T G [t] 240 
LUCPPLYG.SEQC AATTGTGGATACAAGATGAATGATGTAG T G T C G a T C T G C 240 
®D1561H9.SEQC A A T T G T G g\c\t ACAAGATGA a[c]g a[c]g T [c] G T |t A G t| a T C T G [t] 240 

SRVER51.SSQ G c[t]g A G A A T A a[c]a|c T c|g[c]t T [c] T T T A T T C c |t g| t[a1a T [c] G C0G 280 
::^UCPPLYG.SEQG CCGAGAATAAT a a a A G ATTTTTTATTC C O A T TATTGCAG 280 
|kD1561H9,SEQG C [t] G A^A a[c]a A T a|c C c|g[t]t T [c] T t[c]a T T C cIa g|t[c]a t[c]g C [c] G 280 

^RVERSl.SEQ C T T G G T A[cjA T [c] G g(c]a T G A T T G T [c] G C [c] C C T G t[g]a A T G A a |t c| 320 
iS^UCPPLYG . SEQ CTTGGTATATTGGTATGATTGTAGCACCTGTTAATGAAAG 320 
r'RD1561H9.SEQc[A|T G G T A T A T [c] G G T A T G A T [c] G T [g] G 0 00 c[a1g t[c]a a[c]g a[g]a G 320 

^feRVERSl.SEQ TTACATCCCAGATG a\g\c t[g]tGTAAGGt[t]aTGGGTA T |t A G c| 360 
;:tuCPPLYG. SEQ TTACATCCCAGATGAACTCTGTAAGGTCATGGGTATATCG 360 
^^J^D1561H9.SEq[c]t A C A T [t] C c\c\g a\c\g A A C T |g] T G T A a[a]g TCATGGGTA t[c]t 00 360 

^^RVERSl.SEQ A A A C c[t]c A A A t[c]g t[c]t T t|a^T A o[c]a a[a]a A C A t(c]t t[g]a A T A 400 
^4uCPPLYG.SEQ A AACCACAAATAGTTT T T T G T A CAAAGAACATTTTAAATA 400 
RD1561H9,SEQA a[g]c C A C a[g]a t[t]g t[c]t T |c A C cj A c[t]a A G A A0A T t[c]t[g]a a[c]a 400 

GRVER51.SEQ A G G t[c]t T G G a[a}g T [c] C A G |t C T cI gRa C T A a[c]t T C A t[c]a A a(c]g 440 
LUCPPLYG.SEQA G G T at T GGAGGTACAGAGCAGAACTAATTTCATAA a a A G 440 
RD1561H9.SEQa[a1g t |c c| t G G a[a]g T0C A^A G c[c]g|c]a c[c]a a[c]t t[t]a t[t]a a |g c| g 440 

GRVER51.SEQ [c]aT CAt[t]at[t]ct[g]gATAc[c]g t[c]gaAAA CAt[c]cACGg[c]t G T 480 
LUCPPLYG. SEQ GATCATCAT A C TTGATACTGTAGAAAACATACACGGTTGT 480 
RD1561H9.SEq[t]a T C A T C A t |c t| t[g]g a[c]a C T G T [g] G a[g]a a[t]a T (t] C a C G G T T g[c] 480 

GRVER51.SEQ G aIgJa G [c] C T [c] C c[t]a a[c]t t[c]a T [c] T C T C G T T a|c A G c]g A T G G [t] A 520 
LUCPPLYG.SEQG a a AG T CTTCCCAATTTTATTTCTCGTTATTCGGATGGAA 520 
RD1561H9.SEQG A a|t c|t[t]t[g]c c[t]aA T T t[c]a t[c]t C T C g[c]t AT T c[A]GA[cjG g[c]a 520 

GRVER51.SEQ ATA t[c]g c[t]a a[t]t T C A a[g|c C [c] T T [g] C A T t |t t| g A T C cjAjG t\c\g A 560 
LUCPPLYG.SEQA TATTGCCAACTTCAAAC C T T T ACATTACGATCCTGTTGA 560 
RD1561H9.SEQa[c]a t[c]g c[a]a ACT t[t]a A A C c|a c|t[c]c a[c]t[t]c G a[c]c C T G T [g] G A 560 



Figure 1 1 (Cont.) 

GRVER51.SEQ GCAAGTGGc[c]gCTAt[t]tt[g]tg[c1tc[c]tc[c]gGCAc|c]aCTGG0 
LUCPPLYG, SEQ GCAAGTGGCAGCTAT C T TATGTTCGTCAGGCACTACTGGA 600 
RD1561H9.SEq[a]c A A G t[t]g GAG c[c]a t |t c| T [g| T G t |a G C A G c| g g[t]a CTACTGGA 



600 



GRVER51.SEQ Tt[g]cc[t]aAAGGTGt[^ATGCa[g]aCTCACCa[g]aATAt[c]tGTG 640 
LUCPPLYG. SEQ TTACCGAAAGGTGTAATGCAAACTCACCAAAATATTTGTG 640 
RD1561H9.SEq[c]t[c]c c[A]AA[G]GG[AjGT[c]ATGCA[G]Ac[c]cAlT]cAAAA[c]ATTTG[c]G 640 

GRVER51.SEQ T [g] C G T [g] A t[c]c a[c]g C t[c]t[c]gaC C C G [t] G G G [t| A C [t] C A 680 
LUCPPLYG. SEQ T CCGACTTATACATGCTTTAGACC C CA G GGCA G G A A C G C A 680 

rd1561H9,seqt[g]c g[t]c t\g\a t [c] c A t g c t [c] t [c] g a[t]c c |a c| g |c t a c| g g[c]a c[t]c a 680 

GRVER51.SEQ a[t]t[g]at[c]c C TGg[c]g T GAc[t1g t |g c| t[g]g t[g]tATCTGCCTTt[c] 720 
LUCPPLYG. SEQ A CTTATTCCTGGTGTGACAGTCTTAGTATATCTGCCTTTT 720 
RD1561H9.SEq[g]c t[g]a TTCCTGGTG T [c] A C [c] G T C T T [g] G T [c] T a |c t| t G C C T T T [c] 720 

GRVER51.SEQ T T [t] C a\c\g C [c] T T T G G [t] T T C T C T A T [t] A [c] C [c] T G G G [c] T A [t] T T C A 760 
LUCPPLYG. SEQ T TCCATGCTTTTGGGT T C T C T ATAAACTTGGGATACTTCA 7 60 
RD1561H9.SEQT TCCATGCTT T [cj G G [c] T T |t C a| t A t[t]a |c t] T T G G G [t] T ACT t[t]a 760 



GRVERSl.SEQ T G G T [c] G g |c tI tIgIc GTGt[c]aTCATGT t |t cl cfrjc g[c]t T [c] G a[c]c A 800 
LUCPPLYG. SEQ T GGTGGGTCTTCGTGTTATCATGTTAAGACGATTTGATCA 800 
^HD1561H9.SEQT G G t[c]g G T C t[c]c g[c]g t[g]a t[t]a T G T t |c c| g[c1c g[t]ttTGATCA 



800 



:^'rver51.seq agaagc[c]t t |c t| t[g]a a[g]gctattca[a]ga[c]ta[c]ga[g]gt[g]cg[t] 84 0 

■^^WuCPPLYG . SEQ AGAAGCATT T C TAAAAGCTATTCAGGATTATGAAGTTCGA 840 
%D1561H9.SEq[g]g a[g]g c[t]t t |c t| t[g]a a a G c[c]a t{c\c a[a]g ATTATGAAG t[c]c g[c] 840 

"i^RVERSLSEQ 1t C c| g T [g] A t[c]a A C G T [c] C c |t t| c a[g]t[c]a T [t] T T G T T c[c]t |g A G c| a 880 

bucpplyg . seq a gtgtaattaacgttc c a g c a a t a a t at t gttcttatcga 880 
||Idi561h9.seqa gtgt[c]at[c]aacgt[g]c c |t a g c"g1 t[g]a t |c c| t gtt[t]tt[g]tc[t]a 880 

fgRVERSl.SEQ A a |t c| t CCTTTGGTTGACA a[g]t a[t]g A tHt Ig A G c| A g[c]t t |g c| g 920 
glUCPPLYG . SEQ A A A G T C C T T T GGTTGACAAATACGATTTATCAAGTTTAAG 920 
mD1561H9.SEQA[G]A G [c] C c |a c| t[c]g T [g] G A C A a[g]t ACGa[c]tt[g]t c |t T C a"c] t |g c| g 920 



ffRVERSl.SEQ [t]g a |g c| T G T G [c] T G [t] G g\c\g C [t] G C [t] C C [t] T T [g] G C [c] A A A G A A G T [g] 960 



^^ucpplyg. SEQ ggaattgtgttgcggtgcggcaccattagcaaaagaagtt 960 

'1.di561h9.seq[t]g aattgtgttgcggtg c[c]g c[t]c c a[c]t[g]g c[t]a a[g]g a[g]g t[c] 960 

GRVER51.SEQ Gc[c|gAGGt[c1g c[t]g[^T|a a|g]c g |t cI tIgIaA c[clTfc]c cfxlG g[t]at[c]c 1000 

LUCPPLYG . SEQ G CTGAGGTTGCA G T A A AACGATTAA A C T T GCCAGGAATTC 1000 

RD1561H9.SEQG C T G a[a]g T [g] G c[c|g |c c| a A A C G [c] T t[g]a a |t cl Tfrjc C A G g[g]a T T C 1000 

GRVER51.SEQ G C T G [c] G G^T TTGGTTTGA C [t] G a |g A G c |a C T T C0G C T A a[c]a T 1040 

LUCPPLYG. SEQ G CTGTGGATTTG G T T T GACAGAATCTACTTCAGCTAATAT 104 0 

RD1561H9.SEQg[t]t G T G G [c] T T [c] G g |c c| t[c]a c[c]g a a T C T a c |c a G t| g c[g]a[t]t a T 1040 



GRVER51.SEQ [c]c a[t]a g |c t| t [g^ g[a1g a|c]ga[g]t T TAa[g|t c[t]g g |t A G c| c T [g] G G [t] 1080 

LUCPPLYG. SEQ A CACAGTCTTGGGGATGAATTTA A A T C A G GAT C AC T T G G A 1080 

RD1561H9.SEq[c]c a[g1a[c]t C T [c] G G G G A T G a[g]t T T A a |g A G c| g G [c] T c |t tI tIgIg g\c\ 1080 

GRVER51.SEQ [c] G [c] G T [g] A C T C C t[c]t[t]a T G G C [t] G C [a] A a[g]a T [c] G C [c] G a |c c| g[t]g 1120 

LUCPPLYG. SEQ A GAGTTACTC C T T T AATGGCAGCTAAAATAGCAGATAGGG 1120 

RD1561H9,SEq[c]g[t]g T [c] A C T C C |a c| t[c]a T G G C [t] G C T A a[g]a T [c] G C0G A T [c] G [c] G 1120 
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Figure 1^ A- 
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GRverS.I DNA sequence of pGL3 vectors 



ATGGTGAAACGCGAAAAGAACGTGATCTACGGCCCAGAACCACTGCATCC 5 0 
ACTGGAAGACCTCACCGCTGGTGAGATGCTCTTCCGAGCACTGCGTAAAC 100 
ATAGTCACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAgCCTC 150 
TCCTACAAAGAATTTTTCGAAGCTACTGTGCTGTTGGCCCAAAGCCTCCA 2 0 0 
TAATTGTGGGTACAAAATGAACGATGTGGTGAGCATTTGTGCTGAGAATA 25 0 
ACACTCGCTTCTTTATTCCTGTAATCGCTGCTTGGTACATCGGCATGATT 3 0 0 
GTCGCCCCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGGTTAT 350 
GGGTATTAGCAAACCTCAAATCGTCTTTACTACCAAAAACATCTTGAATA 400 
AGGTCTTGGAAGTCCAGTCTCGTACTAACTTCATCAAACGCATCATTATT 450 
CTGGATACCGTCGAAAACATCCACGGCTGTGAGAGCCTCCCTAACTTCAT 5 00 
CTCTCGTTACAGCGATGGTAATATCGCTAATTTCAAGCCCTTGCATTTTG 550 
ATCCAGTCGAGCAAGTGGCCGCTATTTTGTGCTCCTCCGGCACCACTGGT 600 
TTGCCTAAAGGTGTCATGCAGACTCACCAGAATATCTGTGTGCGTTTGAT 650 
CCACGCTCTCGACCCTCGTGTGGGTACTCAATTGATCcCTGGCGTGACTG 7 0 0 
TGCTGGTGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTACCCTG 750 
GGCTATTTCATGGTCGGCTTGCGTGTCATCATGTTTCGTCGCTTCGACCA 8 00 
AGAAGCCTTCTTGAAGGCTATTCAAGACTACGAGGTGCGTTCCGTGATCA 850 
ACGTCCCTTCAGTCATTTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAG 90 0 
TATGATCTGAGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTTTGGC 950 
CAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAACCTCCCTGGTATCC 1000 
GCTGCGGTTTTGGTTTGACTGAGAGCACTTCTGCTAACATCCATAGCTTG 1050 
CGAGACGAGTTTAAGTCTGGTAGCCTGGGTCGCGTGACTCCTCTTATGGC 1100 
TGCAAAGATCGCCGACCGTGAGACCGGCAAAGCACTGGGCCCAAATCAAG 1150 
TCGGTGAATTGTGTATTAAGGGCCCTATGGTCTCTAAAGGCTACGTGAAC 12 00 
AATGTGGAGGCCACTAAAGAAGCCATTGATGATGATGGCTGGCTCCATAG 125 0 
CGGCGACTTCGGTTACTATGATGAGGACGAACACTTCTATGTGGTCGATC 13 0 0 
GCTACAAAGAATTGATTAAGTACAAAGGCTCTCAAGTCGCACCAGCCGAA 135 0 
CTGGAAGAAATTTTGCTGAAGAACCCTTGTATCCGCGACGTGGCCGTCGT 1400 
GGGTATCCCAGACTTGGAAGCTGGCGAGTTGCCTAGCGCCTTTGTGGTGA 1450 
AACAACCCGGCAAGGAGATCACTGCTAAGGAGGTCTACGACTATTTGGCC 15 00 
GAGCGCGTGTCTCACACCAA?VTATCTGCGTGGCGGCGTCCGCTTCGTCGA 155 0 
TTCTATTCCACGCAACGTTACCGGTAAGATCACTCGTAAAGAGTTGCTGA 1600 
AGCAACTCCTCGAAAAAGCTGGCGGC 1626 




RDver5,1 DNA sequence of pGL3 vectors 



ATGGTGAAGCGTGAGAAAAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 10 0 
ACTCTcATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 150 
AGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTGGCTCAGTCCCTCCA 2 00 
CAATTGTGGCTACAAGATGAACGACGTCGTTAGTATCTGTGCTGAAAACA 250 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 3 00 
GTGGCTCCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAGTCAT 3 50 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 4 00 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 45 0 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 50 0 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 55 0 
ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 7 00 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 75 0 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 8 00 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 850 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAG 900 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 950 
TAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAATCTTCCAGGGATTC 100 0 
GTTGTGGCTTCGGCCTCACCGAATCTACCAGCGCTATTATTCAGTCTCTC 105 0 
CGCGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 1100 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAAG 1150 
TGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 12 00 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTC 12 50 
TGGTGATTTTGGATATTACGACGAAGATGAGCATTTTTACGTCGTGGATC 13 00 
GTTACAAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 135 0 
TTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGCGATGTCGCTGTGGT 14 0 0 
CGGCATTCCTGATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 145 0 
AGCAGCCTGGTAAAGAAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 1500 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 1550 
CTCCATCCCTCGTAACGTAACAGGCAAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGAGAAGGCCGGCGGT 162 6 



RD1561H9 DNA sequence of pGL3 vectors 



ATGGTAAAGCGTGAGAAAAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 100 
ACTCTCATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 15 0 
AGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTGGCTCAGTCCCTCCA 2 00 
CAATTGTGGCTACAAGATGAACGACGTCGTTAGTATCTGTGCTGAAAACA 2 50 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 3 00 
GTGGCTCCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAGTCAT 3 5 0 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 40 0 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 45 0 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 500 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 550 
ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 7 00 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 75 0 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 80 0 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 85 0 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAG 90 0 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 950 
TAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAATCTTCCAGGGATTC 100 0 
GTTGTGGCTTCGGCCTCACCGAATCTACCAGTGCGATTATCCAGACTCTC 1050 
GGGGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 1100 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAAG 1150 
TGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 12 00 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTC 1250 
TGGTGATTTTGGATATTACGACGAAGATGAGCATTTTTACGTCGTGGATC 13 00 
GTTACAAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 1350 
TTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGCGATGTCGCTGTGGT 14 0 0 
CGGCATTCCTGATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 145 0 
AGCAGCCTGGTACAGAAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 15 00 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 155 0 
CTCCATCCCTCGTAACGTAACAGGCAAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGTGAAGGCCGGCGGT 162 6 



GRver5.1 protein sequence of pGL3 vectors 



MVKREKNVI YGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDWGDESL 5 0 
SYKEFFEATVLLAQSLHNCGYKMNDWSICAENNTRFFIPVIAAWYIGMI 10 0 
VAPWESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRIII 150 

LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 2 0 0 w 
LPKGVMQTHQNICVRLIHALDPRVGTQLIPGVTVLVYLPFFHAFGFSITL 250 ^-P/Q XD 

GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 30 0 J^-^ ^ 

YDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSANIHSL 35 0 
RDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 4 0 0 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 450 
LEEILLKNPCIRDVAWGIPDLEAGELPSAFWKQPGKEITAKEVYDYLA 5 00 
ERVSHTKYLRGGVRFVDS I PRNVTGKITRKELLKQLLEKAGG 54 2 



RDver5,1 protein sequence of pGL3 vectors 

MVKREKNVI YGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDWGDESL 5 0 
SYKEFFEATVLLAQSLHNCGYKMNDWSICAElSnsrTRFFIPVIAAWYIGM 100 
VAPVNESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRiri 15 0 
LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 2 0 0 

LPKGVMQTHQNICVRLIHALDPRYGTQLIPGVTVLVYLPFFHAFGFHITL 250 C^^aZH'TV^ (iSff ' "\ ^ 

GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVIWPSVILFLSKSPLVDK 3 00 TtosJ-V ' 

YDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSAIIQSL 350 
RDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPIVrVSKGYVN 4 0 0 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 450 
LEEILLKNPCIRDVAWGIPDLEAGELPSAFWKQPGKEITAKEVYDYLA 50 0 
ERVSHTKYLRGGVRFVDS IPRNVTGKITRKELLKQLLEKAGG 542 



RD1561H9 protein sequence of pGL3 vectors 

MVKREKNVIYGPEPLHPLEDLTAGEMIiFRALRKHSHLPQALVDWGDESL 5 0 
S YKEFFEATVLLAQSLHNCGYKMNDWS ICAENNTRFFI PVIAAWYIGMI 100 
VAPVNESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRIII 150 

LDTVENIHGCESLPMFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 200 /^--r ^— ^. ' I ^/l> 

LPKGVMQTHQNICVRLIHAIiDPRYGTQLIPGVTVLVYLPFFHAFGFHITL 2 50 ^XM^ " 
GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 3 00 
YDLSSLRELCCGAAPLAKEVAEVAAKRLMLPGIRCGFGLTESTSAI IQTL 3 50 
GDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 4 00 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 45 0 
LEEILLKNPCIRDVAWGIPDLEAGELPSAFWKQPGTEITAKEVYDYLA 50 0 
ERVSHTKYLRGGVRFVDS I PRNVTGKITRKELLKQLLVKAGG 542 
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SEQUENCE LISTING 

<110> Wood, Keith V. 

Gruber^ Monika G. 
Zhuang, Yao 



<12 0> SYNTHETIC NUCLEIC ACID MOLECULE 

COMPOSITIONS AND METHODS OF PREPARATION 



<130> 341.005US1 



<160> 302 



<170> FastSEQ for Windows Version 4.0 



<210> 1 
<211> 1629 
<212> DNA 

<213> Pyrophorus plagiophthalamus 



<400> 1 

atgatgaaga 

ttaacagcag 

ttagtagatg 

ctcctagcgc 

gccgagaata 

gtagcacctg 

aaaccacaaa 

agaactaatt 

gaaagtcttc 

ttacattacg 

ttaccgaaag 

gaccccaggg 

ttccatgctt 

atgttaagac 

agtgtaatta 

tacgatttat 

gctgaggttg 

gaatctactt 

agagttactc 

ccaaatcaag 

aatgtagaag 

ggatactatg 

tataagggct 

atcagagatg 

tttgtggtta 

gagagggtct 

aggaatgtta 

tctaaactt 



gagagaaaaa 

gagaaatgct 

tgtttggtga 

aaagtctcca 

ataaaagatt 

ttaatgaaag 

tagttttttg 

tcataaaaag 

ccaattttat 

atcctgttga 

gtgtaatgca 

caggaacgca 

ttgggttctc 

gatttgatca 

acgttccagc 

caagtttaag 

cagtaaaacg 

cagctaatat 

ctttaatggc 

ttggtgaatt 

ctaccaaaga 

atgaggatga 

ctcaggtagc 

ttgctgtggt 

aacagcccgg 

cccatacaaa 

caggt'aaaat 



tgttatatat 
cttcagggcc 
cgaatcgctt 
caattgtgga 
ttttattccc 
ttacatccca 
tacaaagaac 
gatcatcata 
ttctcgttat 
gcaagtggca 
aactcaccaa 
acttattcct 
tataaacttg 
agaagcattt 
aataatattg 
ggaattgtgt 
attaaacttg 
acacagtctt 
agctaaaata 
atgcgttaaa 
agctattgat 
gcatttctat 
acctgcagaa 
tggtattcct 
aaaggagatt 
gtatttgcgt 
tacaagaaag 



ggacccgaac 

cttcgaaaac 

tcctataaag 

tacaagatga 

attattgcag 

gatgaactct 

attttaaata 

cttgatactg 

tcggatggaa 

gctatcttat 

aatatttgtg 

ggtgtgacag 

ggatacttca 

ctaaaagcta 

ttcttatcga 

tgcggtgcgg 

ccaggaattc 

ggggatgaat 

gcagataggg 

ggtcccatgg 

gatgatggtt 

gtggtggacc 

ctagaagaga 

gatctagaag 

acagctaaag 

ggaggggttc 

gaacttctga 



ccctacaccc 
attctcattt 
agttttttga 
atgatgtagt 
cttggtatat 
gtaaggtcat 
aggtattgga 
tagaaaacat 
atattgccaa 
gttcgtcagg 
tccgacttat 
tcttagtata 
tggtgggtct 
ttcaggatta 
aaagtccttt 
caccattagc 
gctgtggatt 
ttaaatcagg 
aaactggtaa 
tatcgaaagg 
ggcttcactc 
gttacaagga 
ttttattgaa 
ctggagaact 
aagtgtacga 
gattcgttga 
agcagttgct 



cttggaagac 

accgcaggct 

agctacatgc 

gtcgatctgc 

tggtatgatt 

gggtatatcg 

ggtacagagc 

acacggttgt 

cttcaaacct 

cactactgga 

acatgcttta 

tctgcctttt 

tcgtgttatc 

tgaagttcga 

ggttgacaaa 

aaaagaagtt 

tggtttgaca 

atcacttgga 

agcattggga 

ttacgtgaac 

tggagacttt 

attgattaaa 

aaatccatgt 

gccatctgcg 

ttatcttgcc 

tagcatacca 

ggagaagagt 



60 
12 0 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1629 



<210> 2 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of clone YG#81-6G01 



<400> 2 



840 
900 



atgatgaagc gagagaaaaa tgttatatat ggacccgaac ccctacaccc cttggaagac 60 

ttaacagctg gagaaatgct cttccgtgcc cttcgaaaac attctcattt accgcaggct 12 0 

ttagtagatg tggttggcga cgaatcgctt tcctataaag agttttttga agcgacagtc 18 0 

ctcctagcgc aaagtctcca caattgtgga tacaagatga atgatgtagt gtcgatctgc 240 

gccgagaata atacaagatt ttttattccc gttattgcag cttggtatat tggtatgatt 3 00 

gtagcacctg ttaatgaaag ttacatccca gatgaactct gtaaggtgat gggtatatcg 360 

aaaccacaaa tagtttttac gacaaagaac attttaaata aggtattgga ggtacagagc 420 

agaactaatt tcataaaaag gatcatcata cttgatactg tagaaaacat acacggttgt 480 

gaaagtcttc ccaattttat ttctcgttat tcggatggaa atattgccaa cttcaaacct 540 

ttacatttcg atcctgttga gcaagtggca gctatcttat gttcgtcagg cactactgga 600 

ttaccgaaag gtgtaatgca aactcaccaa aatatttgtg tccgacttat acatgcttta 660 

gaccccaggg caggaacgca acttattcct ggtgtgacag tcttagtata tctgcctttt 72 0 

ttccatgctt ttgggttctc tataaccttg ggatacttca tggtgggtct tcgtgttatc 780 
atgttcagac gatttgatca agaagcattt ctaaaagcta ttcaggatta tgaagttcga 
agtgtaatta acgttccatc agtaatattg ttcttatcga aaagtccttt ggttgacaaa 

tacgatttat caagtttaag ggaattgtgt tgcggtgcgg caccattagc aaaagaagtt 96 0 

gctgaggttg cagcaaaacg attaaacttg ccaggaattc gctgtggatt tggtttgaca 102 0 

gaatctactt cagctaatat acacagtctt agggatgaat ttaaatcagg atcacttgga 1080 

agagttactc ctttaatggc agctaaaata gcagataggg aaactggtaa agcattggga 1140 

ccaaatcaag ttggtgaatt atgcattaaa ggtcccatgg tatcgaaagg ttacgtgaac 1200 

aatgtagaag ctaccaaaga agctattgat gatgatggtt ggcttcactc tggagacttt 1260 

ggatactatg atgaggatga gcatttctat gtggtggacc gttacaagga attgattaaa 13 2 0 

tataagggct ctcaggtagc acctgcagaa ctagaagaga ttttattgaa aaatccatgt 13 8 0 

atcagagatg ttgctgtggt tggtattcct gatctagaag ctggagaact gccatctgcg 144 0 

tttgtggtta aacagcccgg aaaggagatt acagctaaag aagtgtacga ttatcttgcc 1500 

gagagggtct cccatacaaa gtatttgcgt ggaggggttc gattcgttga tagcatacca 1560 

aggaatgtta caggtaaaat tacaagaaag gaacttctga agcagttgct ggagaaggcg 162 0 

ggaggt 

<210> 3 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 



1626 



60 



<400> 3 

atgatgaaac gcgaaaagaa cgtcatctac ggcccagagc ctctgcaccc attggaagac 

ctgaccgccg gtgagatgtt gttccgtgct ctgcgtaaac attctcactt gcctcaagcc 120 

ctggtggatg tcgtgggcga cgaaagcttg tcttataagg agtttttcga agctactgtc 18 0 

ctgttggccc agtctctgca taattgcggt tacaaaatga acgatgtggt cagcatttgt 240 

gctgagaata acacccgctt tttcatccca gtgattgccg cttggtacat cggcatgatt 3 00 

gtcgcccctg tgaatgaatc ttatatccca gacgagttgt gcaaggtcat gggtattagc 360 

aaacctcaaa tcgtgtttac taccaagaac attctgaata aagtcttgga agtgcagtct 420 

cgtactaact tcatcaagcg cattatcatt ctggataccg tcgagaatat ccacggctgt 4 80 

gaaagcttgc caaactttat ttctcgttat agcgacggta atatcgctaa cttcaagcct 540 

ctgcattttg atccagtgga gcaagtcgcc gctattttgt gctctagcgg cactaccggt 600 

ctgcctaaag gcgtgatgca gactcaccaa aatatctgtg tccgcttgat tcatgccctg 660 

gacccacgtg tgggtaccca gttgatccct ggcgtgactg tcctggtgta cttgccattc 720 

tttcacgcct tcggtttttc tattaccctg ggctatttca tggtcggttt gcgcgtgatc 780 
atgtttcgtc gcttcgatca agaagctttt ctgaaggcca ttcaggacta cgaggtccgt 
agcgtgatca acgtcccttc tgtgattttg ttcctgagca aatctccatt ggtcgataag 

tatgacctga gctctttgcg cgaactgtgc tgtggcgctg cccctttggc taaagaggtg 960 

gccgaagtcg ctgccaagcg tctgaatttg ccaggtatcc gctgcggctt tggtctgact 1020 

gagagcacct ctgctaacat tcatagcttg cgtgatgaat tcaaatctgg cagcctgggt 1080 

cgcgtgactc ctttgatggc cgctaagatc gccgaccgtg agaccggcaa agctctgggt 114 0 
ccaaatcaag- tcggcgaatt gtgtattaag ggtcctatgg tgtctaaagg ctacgtcaac 



840 
900 



1200 



2 



aatgtggagg ccactaagga agctatcgat gacgatggtt ggctgcacag cggcgacttt 1260 

ggttattacg atgaggacga acatttctat gtcgtggatc gctacaaaga gttgattaag 1320 

tataaaggct ctcaggtcgc cccagctgag ctggaagaga tcttgctgaa gaacccttgc 13 8 0 

attcgtgacg tggccgtcgt gggtatccca gatttggaag ctggcgagct gcctagcgcc 1440 

tttgtcgtga aacaaccagg taaggaaatt accgctaaag aggtctacga ctatttggcc 15 0 0 

gaacgcgtgt ctcacactaa gtacctgcgt ggcggtgtcc gcttcgtgga tagcatccct 1560 

cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 162 0 

ggtggc ^^^^ 



<210> 4 

<211> 1626 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



600 
660 
720 



<400> 4 

atgatgaaac gcgaaaagaa cgtcatctac ggcccagagc ctctgcaccc attggaagac 60 

ctgaccgctg gtgagatgtt gttccgtgct ctgcgtaaac attctcactt gcctcaagcc 12 0 

ctggtcgatg tcgtgggcga cgagagcttg tcttataagg aatttttcga agctactgtc 18 0 

ctgttggccc aatctctgca taattgcggt tacaaaatga acgatgtggt cagcatttgt 24 0 

gctgagaata acacccgctt tttcatccca gtgattgccg cttggtacat cggcatgatt 3 00 

gtcgcccctg tgaatgaatc ttatatccca gacgagttgt gcaaggtcat gggtattagc 360 

aaacctcaaa tcgtgtttac taccaagaac attctgaata aggtcttgga agtgcagtct 42 0 

cgtactaact tcatcaagcg cattatcatt ctggataccg tcgagaatat ccacggctgt 480 

gagagcttgc caaactttat ttctcgttat agcgacggta atatcgctaa cttcaagcct 54 0 
ctgcattttg atccagtgga gcaagtcgcc gctattttgt gctctagcgg caccaccggt 
ctgcctaaag gcgtgatgca gactcaccaa aatatctgtg tccgcttgat tcatgccctg 
gacccacgtg tgggtactca gttgatccct ggcgtgactg tcctggtgta cttgccattc 

tttcacgcct tcggtttttc tattaccctg ggctatttca tggtcggttt gcgcgtgatc 780 

atgtttcgtc gcttcgatca agaagccttt ctgaaggcca ttcaagacta cgaggtccgt 84 0 

agcgtgatca acgtcccttc tgtgattttg ttcctgagca aatctccatt ggtcgataag 900 

tatgacctga gcagcttgcg cgaactgtgc tgtggcgctg cccctttggc taaagaggtg 960 

gccgaagtcg ctgccaagcg tctgaatttg ccaggtatcc gctgcggctt tggtctgact 1020 

gagagcacct ctgctaacat tcatagcttg cgtgatgagt tcaaatctgg cagcctgggt 1080 

cgcgtgactc ctttgatggc cgctaagatc gccgaccgtg agaccggcaa agctctgggt 114 0 

ccaaatcaag tcggcgaatt gtgtattaag ggtcctatgg tgtctaaagg ctacgtcaac 1200 

aatgtggagg ccactaagga agctattgat gacgatggtt ggctgcacag cggcgacttt 1260 

ggttattacg atgaggacga acatttctat gtcgtcgatc gctacaaaga gttgattaag 1320 

tataaaggct ctcaagtcgc cccagctgag ctggaagaaa tcttgctgaa gaacccttgc 1380 

attcgtgacg tggccgtcgt gggtatccca gatttggaag ctggcgagct gcctagcgcc 144 0 

tttgtcgtga aacaaccagg caaggaaatt accgctaaag aggtctacga ctatttggcc 15 00 

gagcgcgtgt ctcacactaa gtacctgcgt ggcggtgtcc gcttcgtcga tagcatccct 1560 

cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 162 0 

ggtggc ^^^^ 



<210> 5 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 5 

atgatgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 
ctcaccgctg gtgagatgct gttccgtgcc ctgcgtaaac atagccacct gcctcaagct 



3 



600 
660 



840 
900 



ctcgtggacg tcgtgggtga cgagagcctg tcttacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctgca taattgtggt tacaaaatga acgatgtggt gagcatctgt 240 

gctgagaata acactcgctt ttttatccct gtgatcgctg cttggtacat cggcatgatt 300 

gtcgcccctg tgaatgaatc ttacatccca gatgagttgt gtaaggtgat gggtattagc 360 

aaacctcaaa tcgtctttac taccaaaaac atcctgaata aggtcttgga agtccagtct 42 0 

cgtactaatt tcatcaaacg cattattatt ctggataccg tcgaaaacat ccacggctgt 480 

gagagcttgc ctaactttat ctctcgttac agcgatggta atatcgctaa tttcaagcca 540 
ctgcattttg atccagtcga gcaggtcgcc gccattttgt gctcttctgg caccactggt 
ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgcttgat ccacgccctc 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tttgcctttc 72 0 

tttcacgcct ttggtttttc tatcaccctg ggctatttca tggtcggctt gcgtgtgatc 780 
atgtttcgtc gcttcgacca agaagccttc ctgaaggcta ttcaagacta cgaggtgcgt 
tctgtgatca atgtcccatc tgtcattttg ttcctgagca aatctccttt ggttgacaag 

tatgatctga gcagcttgcg tgaactgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgccaacat ccatagcttg cgtgacgagt ttaaatctgg tagcctgggt 1080 

cgcgtgaccc ctttgatggc tgcaaagatc gccgaccgtg agaccggcaa agccctgggc 114 0 

ccaaatcagg tcggtgaatt gtgcattaag ggccctatgg tctctaaagg ctacgtgaac 12 00 

aatgtggagg ccactaaaga agctattgat gatgatggtt ggttgcatag cggcgacttc 1260 

ggttattatg atgaggacga acacttctat gtggtcgatc gctataaaga attgattaag 1320 

tacaaaggct ctcaagtcgc cccagctgaa ctggaagaaa ttttgctgaa gaacccttgt 13 80 

attcgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 144 0 

tttgtggtga aacaacctgg caaggagatt actgctaagg aggtctacga ctatttggcc 15 0 0 

gagcgcgtgt ctcacactaa atatctgcgt ggcggcgtcc gcttcgtcga ttctatccct 1560 

cgcaacgtca ccggcaagat cactcgtaaa gagttgctga aacaattgct cgaaaaagct 162 0 

ggcggc ^^2 6 

<210> 6 
<211> 1626 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 6 

atgatgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgtgca ctgcgtaaac atagtcacct ccctcaagct 120 

ctcgtggacg tcgtgggaga cgagagcctc tcttacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctcca taattgtgga tacaaaatga acgatgtggt gagcatttgt 24 0 

gctgagaata acactcgctt ctttatccct gttatcgctg cttggtacat cggcatgatt 300 

gtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 360 

aaacctcaaa tcgtctttac taccaaaaat atcctgaata aggtcttgga agtccagtct 420 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccatggctgt 480 

gagagcctgc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaaacca 540 

ctgcattttg atccagtcga gcaagtggcc gctattttgt gctcttccgg caccactggt 600 

ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 720 

tttcacgcct ttggtttttc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 84 0 

tctgtcatca atgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 1140 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 12 0 0 

aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 1320 



4 



1620 
1626 



tacaaaggct ctcaagtcgc cccagccgaa ctggaagaaa ttttgctgaa gaacccttgt 13 80 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggtgagtt gcctagcgcc 1440 

tttgtggtga aacaacctgg aaaggagatc actgctaagg aggtctacga ctatttggcc 150 0 

gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttccatccca 1560 
cgcaacgtga ccggtaagat cactcgtaaa gaattgctga agcaactcct cgaaaaagct 
ggcggc 

<210> 7 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 7 

atgatgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 
ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 12 0 

ctcgtggacg tcgtgggaga cgagagcctc tcctacaaag aatttttcga agctactgtg 180 
ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 240 
gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 3 00 

gtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 360 
aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 42 0 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 



480 



gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 54 0 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 66 0 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 72 0 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 840 

tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 114 0 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 1200 

aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 12 60 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 13 2 0 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 1380 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 144 0 

tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 1500 

gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 1560 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 1620 

-1 (TO <: 

ggcggc -^^^-^^ 

<210> 8 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 8 

atgatgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 12 0 

ctcgtggacg tcgtgggaga cgagaacctc tcctacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 240 



5 



gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 300 

gtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 360 

aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 420 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 480 

gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 540 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatctct ggcgtgactg tgctggtgta tctgcctttc 72 0 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 84 0 

tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 1140 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 1200 

aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 1320 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 13 8 0 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 144 0 

tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 15 0 0 

gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 1560 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 162 0 

ggcggc ^^^6 

<210> 9 
<211> 1626 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 9 

atgatgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 120 

ctcgtggacg tcgtgggaga cgagagcctc tcctacaaag aatttttcga agctactgtg 18 0 

ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 24 0 

gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 300 

gtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 3 60 

aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 420 
cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 



480 



gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 540 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 720 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 840 

tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 102 0 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 1140 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 1200 

aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 1320 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 13 8 0 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 144 0 



6 



tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 15 0 0 

gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 1560 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 162 0 
ggcggc 



1626 



<210> 10 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 10 

atgatgaagc gtgagaaaaa tgtgatttat ggtcctgaac cattgcatcc tctggaggat 60 

ttgactgctg gcgaaatgct gtttcgcgcc ttgcgcaagc acagccatct gccacaggct 12 0 

ttggtcgacg tggtcggtga tgagtctctg agctacaaag aattctttga ggccaccgtg 18 0 

ttgctggctc aaagcttgca caactgtggc tataagatga atgacgtcgt gtctatctgc 24 0 

gccgaaaaca atactcgttt ctttattcct gtcatcgctg cctggtatat tggtatgatc 300 

gtggctccag tcaacgagag ctacattcct gatgaactgt gtaaagtgat gggcatctct 360 

aagccacaga ttgtcttcac cactaaaaat atcttgaaca aggtgctgga ggtccaaagc 42 0 

cgcaccaatt ttattaaacg tatcattatc ttggacactg tggaaaacat tcatggttgc 48 0 

gagtctctgc ctaatttcat cagccgctac tctgatggca acattgccaa ttttaaacca 54 0 

ttgcacttcg accctgtcga acaggtggct gccatcctgt gtagctctgg taccactggc 600 

ttgccaaagg gtgtcatgca aacccatcag aacatttgcg tgcgtctgat ccacgctctc 660 

gatcctcgct acggcactca actgattcca ggtgtcaccg tgttggtcta tctgcctttt 72 0 

ttccatgctt ttggcttcca catcactttg ggttacttta tggtgggcct gcgtgtcatt 780 

atgttccgcc gttttgacca ggaggccttc ttgaaagcta tccaagatta tgaagtgcgc 84 0 

tctgtcatta atgtgccaag cgtcatcctg tttttgtcta agagccctct ggtggacaaa 900 

tacgatttgt ctagcctgcg tgagttgtgt tgcggtgccg ctccactggc caaggaagtc 960 

gctgaggtgg ccgctaaacg cttgaacctg cctggcattc gttgtggttt cggcttgacc 1020 

gaatctacta gcgccattat ccaatctctg cgcgacgagt ttaagagcgg ttctttgggc 1080 

cgtgtcaccc cactgatggc tgccaaaatt gctgatcgcg aaactggtaa ggccttgggc 1140 

cctaaccagg tgggtgagct gtgcatcaaa ggcccaatgg tcagcaaggg ttatgtgaat 12 00 

aacgtcgaag ctaccaaaga ggccattgac gatgacggct ggttgcattc tggtgatttc 12 60 

ggctactatg acgaagatga gcacttttac gtggtcgacc gttataagga actgatcaaa 1320 

tacaagggta gccaagtggc tcctgccgaa ttggaggaaa ttctgttgaa aaatccatgt 13 8 0 

atccgcgatg tcgctgtggt cggcattcct gacctggagg ccggtgaatt gccatctgct 1440 

ttcgtggtca agcagcctgg caaagagatc actgccaagg aagtgtatga ttacctggct 1500 

gagcgtgtca gccataccaa atatttgcgc ggtggcgtgc gttttgtcga ctctattcca 15 60 

cgtaacgtga ctggtaagat cacccgcaaa gaactgttga agcaactgtt ggagaaagcc 162 0 

ggcggt ^^26 

<210> 11 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 11 

atgatgaagc gtgagaaaaa tgtgatttat ggtcctgaac cattgcatcc tctggaggat 60 

ttgactgccg gcgaaatgct gtttcgcgcc ttgcgcaagc acagccatct gccacaagct 12 0 

ttggtggacg tggtcggtga tgaatctctg agctacaaag agttctttga ggcaaccgtg 180 

ttgctggctc agagcttgca caactgtggc tataagatga atgacgtcgt gtctatctgc 24 0 

gccgaaaaca atactcgttt ctttattcct gtcatcgctg cctggtatat tggtatgatc 3 00 

gtggctccag tcaacgagag ctacattcct gatgaactgt gtaaagtgat gggcatctct 3 60 



7 



600 
660 



aagccacaga ttgtcttcac cactaaaaat atcttgaaca aagtgctgga ggtccaaagc 420 

cgcaccaatt ttattaaacg tatcattatc ttggacactg tggaaaacat tcatggttgc 480 

gaatctctgc ctaatttcat cagccgctac tctgatggca acattgccaa ttttaaacca 54 0 
ttgcacttcg accctgtcga acaggtggct gccatcctgt gtagctctgg tactactggc 
ttgccaaagg gtgtcatgca aacccatcag aacatttgcg tgcgtctgat ccacgctctc 

gatcctcgct acggcaccca actgattcct ggtgtcaccg tgttggtcta tctgcctttt 720 

ttccatgctt ttggcttcca catcactttg ggttacttta tggtgggcct gcgtgtcatt 780 

atgttccgcc gttttgacca ggaggctttc ttgaaagcta tccaagatta tgaagtgcgc 840 

tctgtcatta atgtgccaag cgtcatcctg tttttgtcta agagccctct ggtggacaaa 900 

tacgatttgt cttctctgcg tgagttgtgt tgcggtgccg ctccactggc caaggaagtc 960 

gctgaggtgg ccgctaaacg cttgaacctg cctggcattc gttgtggttt cggcttgacc 102 0 

gaatctacta gcgccattat ccaatctctg cgcgacgaat ttaagagcgg ttctttgggc 1080 

cgtgtcaccc cactgatggc tgccaaaatt gctgatcgcg aaactggtaa ggccttgggc 114 0 

cctaaccagg tgggtgagct gtgcatcaaa ggcccaatgg tcagcaaggg ttatgtgaat 1200 

aacgtcgaag ctaccaaaga ggccatcgac gatgacggct ggttgcattc tggtgatttc 12 6 0 

ggctactatg acgaagatga gcacttttac gtggtggacc gttataagga actgatcaaa 1320 

tacaagggta gccaagtggc tcctgccgaa ttggaggaga ttctgttgaa aaatccatgt 13 80 

atccgcgatg tcgctgtggt cggcattcct gacctggagg ccggtgaatt gccatctgct 144 0 

ttcgtggtca agcagcctgg taaagagatc actgccaagg aagtgtatga ttacctggct 1500 

gaacgtgtca gccataccaa atatttgcgc ggtggcgtgc gttttgtgga ctctattcca 1560 

cgtaacgtga ctggtaagat cacccgcaaa gaactgttga agcaactgtt ggagaaagcc 162 0 

ggcggt ^^^^ 

<210> 12 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 12 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctttgcaccc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgcgct ttgcgtaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctataagg agttttttga ggcaaccgtc 180 

ttgctggctc agtctttgca taattgcggc tacaagatga acgacgtcgt ctctatttgt 240 

gccgaaaaca atacccgttt cttcattcca gtcatcgccg cctggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattcct gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtgttcac cactaagaat attttgaaca aagtgctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcatggttgc 480 

gaatctctgc ctaatttcat tagccgctat tctgacggca acatcgccaa ctttaaacct 540 

ttgcatttcg accctgtgga acaagtggct gctatcctgt gtagcagcgg tactactggc 6 00 

ctcccaaagg gcgtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cctgcctttc 720 

ttccatgctt tcggcttcca cattactttg ggttacttta tggtcggtct gcgtgtcatt 780 

atgttccgcc gttttgatca ggaggctttt ttgaaagcca tccaagatta tgaagtccgc 840 

agcgtcatta acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttccctgcg tgagttgtgt tgcggtgccg ccccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctg ccaggcattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcaatctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaaaatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 0 0 

aacgtcgaag ctaccaagga ggccatcgac gacgacggct ggctgcattc tggtgatttt 1260 

ggctactacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggtggc tccagccgag ttggaggaga ttctgttgaa aaatccatgc 13 8 0 

atccgtgatg tcgctgtggt cggcattcct gatctggagg ccggtgaact gccttctgct 1440 

ttcgtcgtca agcagcctgg taaagaaatc accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccataccaa gtacttgcgt ggcggcgtgc gttttgtgga cagcattcca 1560 



8 



cgtaatgtga ctggtaaaat tacccgcaag gaactgttga agcaattgtt ggagaaggcc 

ggcggt 

<210> 13 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 13 

atgatgaagc 

ttgactgccg 

ttggtcgatg 

ttgctggctc 

gcagaaaaca 

gtggctccag 

aagccacaga 

cgcaccaact 

gaatctttgc 

ctccatttcg 

ctcccaaagg 

gatccacgct 

ttccatgctt 

atgttccgcc 

agtgtcatca 

tacgacttgt 

gctgaagtgg 

gaatctacca 

cgtgtcactc 

cctaaccaag 

aacgtcgaag 

ggatattacg 

tacaagggta 

attcgcgatg 

ttcgttgtca 

gaacgtgtga 

cgcaatgtga 

ggcggt 



gtgagaaaaa 

gcgaaatgct 

tggtcggcga 

agtccttgca 

atacccgttt 

tcaacgagag 

ttgtcttcac 

ttattaagcg 

ctaattttat 

accctgtgga 

gagtcatgca 

acggcactca 

tcggcttcca 

gttttgatca 

acgtgcctag 

cttcactgcg 

ccgccaaacg 

gcgctattat 

cactcatggc 

tgggcgagct 

ctaccaagga 

acgaagatga 

gccaggttgc 

tcgctgtggt 

agcagcctgg 

gccatactaa 

ctggcaaaat 



tgtcatctat 

gtttcgtgct 

tgaatctttg 

taattgtggc 

cttcattcca 

ctacattccc 

cactaagaat 

tatcatcatc 

tagccgctat 

acaagttgct 

gacccatcaa 

gctgattcct 

tattactttg 

ggaggctttc 

cgtgatcctg 

tgaattgtgt 

cttgaatctg 

tcagtctctc 

tgctaagatc 

gtgtatcaaa 

ggctatcgac 

gcatttttac 

tccagctgag 

cggcattcct 

taaagaaatt 

gtacttgcgt 

tacccgcaag 



ggccctgagc 

ttgcgtaaac 

agctacaagg 

tacaagatga 

gtcatcgccg 

gacgaactgt 

attctgaaca 

ttggacactg 

tcagacggaa 

gcaatcctgt 

aacatttgcg 

ggtgtcaccg 

ggttacttta 

ttgaaagcca 

tttttgtcta 

tgcggtgccg 

cccggcattc 

cgcgatgagt 

gctgatcgcg 

ggccctatgg 

gacgacggct 

gtcgtggatc 

ttggaggaga 

gatctggagg 

accgccaaag 

ggcggcgtgc 

gagctgttga 



ctttgcatcc 

actctcattt 

agttttttga 

acgacgtcgt 

catggtatat 

gtaaagtcat 

aagtcctgga 

tggagaatat 

acatcgccaa 

gtagcagcgg 

tgcgtctgat 

tcttggtcta 

tggtcggtct 

tccaagatta 

agagcccact 

ctccactggc 

gttgtggctt 

ttaagagcgg 

aaactggtaa 

tgagcaaggg 

ggttgcattc 

gttacaagga 

ttctgttgaa 

ccggcgaact 

aagtgtatga 

gttttgtgga 

aacaattgtt 



tttggaggat 
gcctcaagcc 
ggcaaccgtc 
ctccatttgt 
cggtatgatc 
gggtatctct 
agtccaaagc 
tcacggttgc 
ctttaagcct 
tactactgga 
ccatgctctc 
cttgcctttc 
gcgtgtgatt 
tgaagtccgc 
cgtggacaag 
taaggaggtc 
cggcctcacc 
ctctttgggc 
ggctttgggc 
ttatgtcaat 
tggtgatttt 
gctgatcaaa 
aaatccatgc 
gccttctgct 
ttacctggct 
tagcattcct 
ggagaaggcc 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1626 



<210> 14 
<211> 1626 
<212> DMA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 14 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 60 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 



60 



9 



gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 840 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 102 0 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 0 0 

aacgtcgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 144 0 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 150 0 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt ^^^^ 

<210> 15 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 15 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actcttattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 4 80 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 
atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 
agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 10 8 0 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 12 60 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 8 0 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 144 0 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt 6 



840 
900 



10 



<210> 16 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 16 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 
ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 
ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 18 0 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 
gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 
gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 60 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 
cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 
gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 
ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 
ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

720 
780 
840 



60 



540 
600 



gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 
ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 
atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 90 0 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 12 6 0 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 1380 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt 1^2 6 

<210> 17 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 17 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 6 0 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 18 0 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 
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gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 84 0 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 102 0 

gaatctacca gcgctattat tcagtctctc ggggatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 12 60 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 13 2 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 144 0 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt 1^2 6 

<210> 18 
<211> 1626 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 18 

atgataaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 840 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gtgcgattat ccagactctc ggggatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 0 0 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 1380 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg tacagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggtgaaggcc 1620 
ggcggt 

<210> 19 
<211> 933 



1626 
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<212> DNA 

<213> Renilla reniformis 



300 
360 



<400> 19 

atgacttcga aagtttatga tccagaacaa aggaaacgga tgataactgg tccgcagtgg 60 

tgggccagat gtaaacaaat gaatgttctt gattcattta ttaattatta tgattcagaa 12 0 

aaacatgcag aaaatgctgt tattttttta catggtaacg cggcctcttc ttatttatgg 18 0 

cgacatgttg tgccacatat tgagccagta gcgcggtgta ttataccaga tcttattggt 240 
atgggcaaat caggcaaatc tggtaatggt tcttataggt tacttgatca ttacaaatat 
cttactgcat ggtttgaact tcttaattta ccaaagaaga tcatttttgt cggccatgat 

tggggtgctt gtttggcatt tcattatagc tatgagcatc aagataagat caaagcaata 42 0 

gttcacgctg aaagtgtagt agatgtgatt gaatcatggg atgaatggcc tgatattgaa 480 

gaagatattg cgttgatcaa atctgaagaa ggagaaaaaa tggttttgga gaataacttc 54 0 

ttcgtggaaa ccatgttgcc atcaaaaatc atgagaaagt tagaaccaga agaatttgca 600 

gcatatcttg aaccattcaa agagaaaggt gaagttcgtc gtccaacatt atcatggcct 660 

cgtgaaatcc cgttagtaaa aggtggtaaa cctgacgttg tacaaattgt taggaattat 720 

aatgcttatc tacgtgcaag tgatgattta ccaaaaatgt ttattgaatc ggatccagga 780 

ttcttttcca atgctattgt tgaaggcgcc aagaagtttc ctaatactga atttgtcaaa 840 

gtaaaaggtc ttcatttttc gcaagaagat gcacctgatg aaatgggaaa atatatcaaa 90 0 

tcgttcgttg agcgagttct caaaaatgaa caa 93 3 

<210> 20 
<211> 933 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 20 

atggcttcca aggtgtacga ccccgagcag cgcaagcgca tgatcaccgg ccctcagtgg 60 

tgggcccgct gcaagcagat gaacgtgctg gactccttca tcaactacta cgacagcgag 12 0 

aagcacgccg agaacgccgt gatcttcctg cacggcaacg ccgcctccag ctacctgtgg 180 

aggcacgtgg tgcctcacat cgagcccgtg gcccgctgca tcatccctga cctgatcggc 240 

atgggcaagt ccggcaagag cggcaacggc tcctaccgcc tgctggacca ctacaagtac 3 00 

ctgaccgcct ggttcgagct gctgaacctg cccaagaaga tcatcttcgt gggccacgac 360 

tggggagcct gcctggcctt ccactactcc tacgagcacc aggacaagat caaggccatc 42 0 

gtgcacgccg agagcgtggt ggacgtgatc gagtcctggg acgagtggcc tgacatcgag 480 

gaggacatcg ccctgatcaa gagcgaggag ggcgagaaga tggtgctgga gaacaacttc 54 0 

ttcgtggaga ccatgctgcc cagcaagatc atgcgcaagc tggagcctga ggagttcgcc 600 

gcctacctgg agcccttcaa ggagaagggc gaggtgcgcc gccctaccct gtcctggccc 660 

cgcgagatcc ctctggtgaa gggcggcaag cccgacgtgg tgcagatcgt gcgcaactac 720 

aacgcctacc tgcgcgccag cgacgacctg cctaagatgt tcatcgagtc cgaccctggc 780 
ttcttctcca acgccatcgt cgagggagcc aagaagttcc ccaacaccga gttcgtgaag 
gtgaagggcc tgcacttctc ccaggaggac gcccctgacg agatgggcaa gtacatcaag 

agcttcgtgg agcgcgtgct gaagaacgag cag 93 3 

<210> 21 
<211> 933 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 21 

atggcttcca aggtgtacga ccccgagcaa cgcaaacgca tgatcactgg gcctcagtgg 60 
tgggctcgct gcaagcaaat gaacgtgctg gactccttca tcaactacta tgattccgag 12 0 



840 
900 
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aagcacgccg agaacgccgt gatttttctg catggtaacg ctgcctccag ctacctgtgg 18 0 

aggcacgtcg tgcctcacat cgagcccgtg gctcgctgca tcatccctga tctgatcgga 24 0 

atgggtaagt ccggcaagag cgggaatggc tcatatcgcc tcctggatca ctacaagtac 3 00 

ctcaccgctt ggttcgagct gctgaacctt ccaaagaaaa tcatctttgt gggccacgac 360 

tggggggctt gtctggcctt tcactactcc tacgagcacc aagacaagat caaggccatc 420 

gtccatgctg agagtgtcgt ggacgtgatc gagtcctggg acgagtggcc tgacatcgag 480 

gaggatatcg ccctgatcaa gagcgaagag ggcgagaaaa tggtgcttga gaataacttc 540 

ttcgtcgaga ccatgctccc aagcaagatc atgcggaaac tggagcctga ggagttcgct 600 

gcctacctgg agcccttcaa ggagaagggc gaggttagac ggcctaccct ctcctggcct 660 

cgcgagatcc ctctcgttaa gggaggcaag cccgacgtcg tccagattgt ccgcaactac 72 0 

aacgcctacc ttcgggccag cgacgatctg cctaagatgt tcatcgagtc cgaccctggg 780 

ttcttttcca acgctattgt cgagggagct aagaagttcc ctaacaccga gttcgtgaag 840 

gtgaagggcc tccacttcag ccaggaggac gctccagatg aaatgggtaa gtacatcaag 900 

agcttcgtgg agcgcgtgct gaagaacgag cag 93 3 

<210> 22 
<211> 933 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 22 

atggcttcca aggtgtacga ccccgagcaa cgcaaacgca tgatcactgg gcctcagtgg 60 

tgggctcgct gcaagcaaat gaacgtgctg gactccttca tcaactacta tgattccgag 12 0 

aagcacgccg agaacgccgt gatttttctg catggtaacg ctgcctccag ctacctgtgg 180 

aggcacgtcg tgcctcacat cgagcccgtg gctagatgca tcatccctga tctgatcgga 240 

atgggtaagt ccggcaagag cgggaatggc tcatatcgcc tcctggatca ctacaagtac 300 

ctcaccgctt ggttcgagct gctgaacctt ccaaagaaaa tcatctttgt gggccacgac 360 

tggggggctt gtctggcctt tcactactcc tacgagcacc aagacaagat caaggccatc 42 0 

gtccatgctg agagtgtcgt ggacgtgatc gagtcctggg acgagtggcc tgacatcgag 480 

gaggatatcg ccctgatcaa gagcgaagag ggcgagaaaa tggtgcttga gaataacttc 54 0 

ttcgtcgaga ccatgctccc aagcaagatc atgcggaaac tggagcctga ggagttcgct 600 

gcctacctgg agccattcaa ggagaagggc gaggttagac ggcctaccct ctcctggcct 660 

cgcgagatcc ctctcgttaa gggaggcaag cccgacgtcg tccagattgt ccgcaactac 720 

aacgcctacc ttcgggccag cgacgatctg cctaagatgt tcatcgagtc cgaccctggg 780 

ttcttttcca acgctattgt cgagggagct aagaagttcc ctaacaccga gttcgtgaag 840 

gtgaagggcc tccacttcag ccaggaggac gctccagatg aaatgggtaa gtacatcaag 900 

agcttcgtgg agcgcgtgct gaagaacgag cag 93 3 

<210> 23 
<211> 543 
<212> PRT 

<213> Pyrophorus plagioph thalamus 
<400> 23 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Phe Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Cys Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Lys Arg Phe Phe He Pro He He Ala Ala Trp Tyr 
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He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Tyr Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Leu Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Val Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

Val Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 
535 



90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Asn Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ser 
540 



95 

He Pro Asp Glu 
110 

Val Phe Cys Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Ala 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ala He 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Gly Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Ser Lys Leu 
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<210> 24 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of clone YG#81-6G01 



<400> 24 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

CI Leu Cys Lys Val 

tfJ 115 

Ql Lys Asn He Leu 

He Lys Arg He 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
s 180 
O Leu Cys Ser Ser 

ffl 195 

His Gin Asn He 
210 

^ Gly Thr Gin Leu 

if 225 

^ Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Ala 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 
Gly Tyr Val Asn 
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385 390 395 400 



Asn 


val 


(jlU 


Ala 


Thr 


Lys 


Glu 


Ala 






riiD^ yjj.y x-Lj-> 


Leu 


His 










4 u o 










410 




415 




Ser 


Cjiy 


Asp 


"nVi /~s 
Iriie 


oxy 


Tyr 


Tyr Asp 


\JJL u. 


Asp 


R1 n Hi s Phe Tvr 


Val 


Val 


















425 




430 






Asp 


Arg 


Tyr 


Lys 


o±U 


Leu 


He 


Lys 




Lys 


C?lv Ser Gin Val 


Ala 


Pro 






A "3 c: 
4 o O 










440 






445 






Ala 


Glu 


Leu 


Glu 


CjIU 


1 le 


Leu 


Leu 


Lys 


Aon 


xrX<^ o rix y 


Asp 


Val 




A c rt 
4 D U 










455 








460 






Ala 


Val 


Val 


Gly 


He 


Pro 


Asp 


Leu 


u±u 


±\±cL 


Gly Glu Leu Pro 


Ser 


Ala 


4 65 










A 1 fl 
4 / U 










475 




480 


Phe 


Val 


Val 


Lys 


Cjin 


Pro 


Gly Lys 


\J±U. 




Thr Ala Lys Glu 


Val 


Tyr 










A Q Ci 










490 




495 




Asp 


Tyr 


Leu 


Ala 


CjlU 


Arg 


Val 


Ser 


His 


Thr 


Lys Tyr Leu Arg Gly Gly 








500 










505 




510 






Val 


Arg 


Phe 


Val 


Asp 


Ser 


He 


Pro 


Arg 


Asn 


Val Thr Gly Lys 


He 


Thr 






515 










520 






525 






Arg 


Lys 


Glu 


Leu 


Leu 


Lys 


Gin 


Leu 


Leu Glu Lys Ala Gly Gly 








530 










535 








540 







<210> 25 
<211> 542 
<212> PRT 
<213> Artificial 



Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 



<400> 25 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 
Tyr Leu Pro Phe 
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225 230 235 240 

Phe His Ala Phe Gly Phe Ser lie Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

.<210> 26 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 26 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
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65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 



70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 



75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 
Leu Glu Lys Ala 



80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 
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530 



535 



540 



<210> 27 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 27 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 
Lys Ala Leu Gly 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 
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370 










375 






380 




Gly 


Glu 


Leu 


Cys 


lie 


Lys 


Gly 


Pro 


Met Val Ser 


Lys Gly Tyr Val 


Asn 


385 










390 






395 




Add 
4 u u 


Asn 


Val 


Glu 


Ala 


Thr 


Lys 


Glu 


Ala 


He Asp Asp Asp Gly Trp Leu 


His 










405 








410 


415 




Ser 


Gly 


Asp 


Phe 


Gly 


Tyr 


Tyr Asp 


Glu Asp Glu 


His Phe Tyr Val 










420 










425 


430 




Asp 


Arg 


Tyr 


Lys 


Glu 


Leu 


He 


Lys 


Tyr Lys Gly 


Ser Gin Val Ala 


Jr J- 






435 










440 




445 




Ala 


Glu 


Leu 


Glu 


Glu 


lie 


Leu 


Leu 


Lys Asn Pro 


Cys He Arg Asp 


V CL -L 




450 










455 






460 




TV 1 _ 

Ala 


Val 


Val 


Gly 


lie 


Pro 


Asp 


Leu 


Glu Ala Gly Glu Leu Pro Ser 


Ala 


465 










470 






475 




480 


Phe 


Val 


Val 


Lys 


Gin 


Pro 


Gly Lys 


Glu He Thr 


Ala Lys Glu Val 


Tyr 










485 








490 


495 




Asp 


Tyr 


Leu 


Ala 


Glu 


Arg 


Val 


Ser 


His Thr Lys 


Tyr Leu Arg Gly Gly 








500 










505 


510 




Val 


Arg 


Phe 


Val 


Asp 


Ser 


He 


Pro 


Arg Asn Val 


Thr Gly Lys He 


Thr 






515 










520 




525 




Arg 


Lys 


Glu 


Leu 


Leu 


Lys 


Gin 


Leu 


Leu Glu Lys 


Ala Gly Gly 






530 










535 






540 





<210> 28 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 28 



Met 


Met 


Lys 


Arg 


Glu 


Lys 


Asn 


Val 


He 


Tyr 


Gly 


Pro 


Glu 


Pro 


Leu 


His 


1 








5 










10 










15 




Pro 


Leu 


Glu 


Asp 


Leu 


Thr 


Ala 


Gly 


Glu 


Met 


Leu 


Phe 


Arg 


Ala 


Leu 


Arg 








20 










25 










30 






Lys 


His 


Ser 


His 


Leu 


Pro 


Gin 


Ala 


Leu 


Val 


Asp 


Val 


Val 


Gly Asp 


Glu 






35 










40 










45 








Ser 


Leu 


Ser 


Tyr 


Lys 


Glu 


Phe 


Phe 


Glu 


Ala 


Thr 


Val 


Leu 


Leu 


Ala 


Gin 




50 










55 










60 










Ser 


Leu 


His 


Asn 


Cys 


Gly 


Tyr 


Lys 


Met 


Asn 


Asp 


Val 


Val 


Ser 


He 


Cys 


65 










70 










75 










80 


Ala 


Glu 


Asn 


Asn 


Thr 


Arg 


Phe 


Phe 


He 


Pro 


Val 


He 


Ala 


Ala 


Trp 


Tyr 










85 










90 










95 




He 


Gly Met 


He 


Val 


Ala 


Pro 


Val 


Asn 


Glu 


Ser 


Tyr 


He 


Pro 


Asp 


Glu 








100 










105 










110 






Leu 


Cys 


Lys 


Val 


Met 


Gly 


He 


Ser 


Lys 


Pro 


Gin 


He 


Val 


Phe 


Thr 


Thr 






115 










120 










125 








Lys 


Asn 


He 


Leu 


Asn 


Lys 


Val 


Leu 


Glu 


Val 


Gin 


Ser Arg 


Thr 


Asn 


Phe 




130 










135 










140 










He 


Lys 


Arg 


He 


He 


He 


Leu Asp 


Thr 


Val 


Glu 


Asn 


He 


His 


Gly 


Cys 


145 










150 










155 










160 


Glu 


Ser 


Leu 


Pro 


Asn 


Phe 


He 


Ser 


Arg 


Tyr 


Ser 


Asp 


Gly Asn 


He 


Ala 










165 










170 










175 




Asn 


Phe 


Lys 


Pro 


Leu 


His 


Phe 


Asp 


Pro 


Val 


Glu 


Gin 


Val 


Ala 


Ala 


He 








180 










185 










190 






Leu 


Cys 


Ser 


Ser 


Gly 


Thr 


Thr Gly 


Leu 


Pro 


Lys 


Gly Val 


Met 


Gin 


Thr 






195 










200 










205 








His 


Gin 


Asn 


He 


Cys 


Val 


Arg 


Leu 


He 


His 


Ala 


Leu 


Asp 


Pro 


Arg 


Val 
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210 215 220 

Gly Thr Gin Leu lie Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 29 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 29 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
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50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

lie Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 



55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 
Asp Ser He Pro 



60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
14 0 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 



Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
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515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



<210> 30 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 30 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser His 
35 

Asn Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

lie Gly Met lie 
100 

Leu Cys Lys Val 
115 

Lys Asn lie Leu 
130 

lie Lys Arg lie 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn lie 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val lie 
260 

Ala lie Gin Asp 
275 

lie Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly lie Ser 
120 

Asn Lys Val Leu 
135 

lie lie Leu Asp 
150 

Asn Phe lie Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Ser Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 
Gly Ser Leu Gly 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
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355 

Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

lie Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 31 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 



<400> 31 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 
Gly Thr Thr Gly 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 



25 





195 










200 








9 D R 








His Gin 


Asn 


lie 


Cys 


Val 


Arg 


Leu 


T 1 <=i W-i a 
X±c rllS 


Ala 


Leu 




Pro 


Arg 


Val 


210 










215 








220 










Gly Thr Gin 


Leu 


He 


Pro Gly Val 


lliJl Vd-X 


Leu 


Val 




Leu 


Pro 


Phe 


225 








230 








235 










24 0 


Phe His 


Ala 


Phe 


Gly Phe 


Ser 


He 


IXiX ijtrU. 


Gly Tyr 


Phe 


Met 


Val 


Gly 








245 








o c n 
^ o U 










9 R R 

^ O 3 




Leu Arg 


Val 


lie 


Met 


Phe 


Arg 


Arg 




n~\ -n 
o±n 


u 




Phe 


Leu 


Lys 






260 










O ^ C 








9 7 0 






Ala He 


Gin 


Asp 


Tyr 


tjlU 


val 


Arg 


Ser Val 






Val 


Pro 


Ser 


Val 




275 










280 








opt: 
Z O D 








He Leu 


Phe 


Leu 


Ser 


Lys 


Ser 


Pro 


Leu Val 


Asp 


Lys 


iyr 




J_lC. LL 




290 










295 








300 










Ser Leu 


Arg 


Glu 


Leu 


Cys 


Cys 


Gly Ala Ala 


Pro 


Leu 


i-iXd 




Glu 


Val 


305 








310 








315 










320 


Ala Glu 


Val 


Ala 


Ala 


Lys 


Arg 


Leu 


Asn Leu 


Pro 


Gly 


X -L C 


Arg 


Cys 


Gly 








325 








330 










335 




Phe Gly Leu 


Thr 


Glu 


Ser 


Thr 


Ser 


Ala Asn 


He 


His 


Ser 


Leu 


Arg Asp 






340 










345 








350 






Glu Phe 


Lys 


Ser 


Gly 


Ser 


Leu 


Gly Arg Val 


Thr 


Pro 


Leu 


Met 


Ala 


Ala 




355 










360 








365 








Lys He 


Ala 


Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn 


Gin 


Val 


370 










375 








380 










Gly Glu 


Leu 


Cys 


He 


Lys 


Gly 


Pro 


Met Val 


Ser 


Lys 


Gly Tyr Val 


Asn 


385 








390 








395 










400 


Asn Val 


Glu 


Ala 


Thr 


Lys 


Glu 


Ala 


He Asp Asp Asp Gly Trp 


Leu 


His 








405 








410 










415 




Ser Gly Asp 


Phe 


Gly 


Tyr 


Tyr 


Asp 


Glu Asp 


Glu 


His 


Phe 


Tyr 


Val 


Val 






420 










425 








430 






Asp Arg 


Tyr 


Lys 


Glu 


Leu 


He 


Lys 


Tyr Lys 


Gly 


Ser 


Gin 


Val 


Ala 


Pro 




435 










440 








445 








Ala Glu 


Leu 


Glu 


Glu 


He 


Leu 


Leu 


Lys Asn 


Pro 


Cys 


He 


Arg Asp 


Val 


450 










455 








460 










Ala Val 


Val 


Gly 


He 


Pro 


Asp 


Leu 


Glu Ala Gly Glu 


Leu 


Pro 


Ser 


Ala 


465 








470 








475 










480 


Phe Val 


Val 


Lys 


Gin 


Pro 


Gly Lys 


Glu He 


Thr 


Ala 


Lys 


Glu 


Val 


Tyr 








485 








490 










495 




Asp Tyr 


Leu 


Ala 


Glu 


Arg 


Val 


Ser 


His Thr 


Lys 


Tyr 


Leu 


Arg 


Gly Gly 






500 










505 








510 






Val Arg 


Phe 


Val 


Asp 


Ser 


He 


Pro 


Arg Asn 


Val 


Thr Gly Lys 


He 


Thr 




515 










520 








525 








Arg Lys 


Glu 


Leu 


Leu 


Lys 


Gin 


Leu 


Leu Glu 


Lys 


Ala 


Gly 


Gly 






530 










535 








540 










<210> 32 


























<211> 542 


























<212> PRT 



























<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 
<400> 32 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 



26 



35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 
Asp Tyr Leu Ala 



40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 



Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 



45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 



27 



500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 33 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 33 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
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340 345 350 



Glu 


Phe 


Lys 


Ser Gly 


Ser 


Leu Gly 


Arg Val 


Thr 


Pro 


Leu Met 


Ala 


Ala 






355 










360 








365 






Lys 


He 


Ala 


Asp 


Arg 


Glu Thr Gly 


Lys Ala Leu Gly 


Pro Asn 


Gin 


Val 


370 










375 








Q Q A 
O O U 








Gly Glu 


Leu 


Cys 


lie 


Lys 


Gly 


Pro 


Mec val 


Ser 


Lys 


Glv Tvr 


Val 


Asn 


385 










3 90 








O Q C 

^ y b 








400 


Asn 


Val 


Glu 


Ala 


Thr 


Lys 


LrlU 


Aia 


He Asp 


Asp 


Asp 


Gly Trp 


Leu 


His 










405 








410 








415 




Ser Gly Asp 


Phe 


Gly 


Tyr 


Tyr 


Asp 


Glu Asp 


Glu 


His 


Phe Tyr 


Val 


Val 








420 










425 






430 






Asp 


Arg 


Tyr 


Lys 


Glu 


Leu 


He 


Lys 


Tyr Lys 


Gly 


Ser 


Gin Val 


Ala 


Pro 




435 










440 








445 






Ala 


Glu 


Leu 


Glu 


Glu 


He 


Leu 


Leu 


Lys Asn 


Pro 


Cys 


He Arg 


Asp 


Val 




450 










455 








460 








Ala 


Val 


Val 


Gly 


He 


Pro 


Asp 


Leu 


Glu Ala Gly Glu 


Leu Pro 


Ser 


Ala 


465 










470 








475 








480 


Phe 


Val 


Val 


Lys 


Gin 


Pro 


Gly Lys 


Glu He 


Thr 


Ala 


Lys Glu 


Val 


Tyr 








485 








490 








495 




Asp 


Tyr 


Leu 


Ala 


Glu 


Arg 


Val 


Ser 


His Thr 


Lys 


Tyr Leu Arg Gly Gly 




500 










505 






510 






Val 


Arg 


Phe 


Val 


Asp 


Ser 


He 


Pro 


Arg Asn 


Val 


Thr 


Gly Lys 


He 


Thr 






515 










520 








525 






Arg 


Lys 


Glu 


Leu 


Leu 


Lys 


Gin 


Leu 


Leu Glu 


Lys 


Ala Gly Gly 






530 










535 








540 









<210> 34 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 34 



Met 


Met 


Lys 


Arg 


Glu 


Lys 


Asn 


Val 


He 


Tyr 


Gly 


Pro 


Glu 


Pro 


Leu 


His 


1 








5 










10 










15 




Pro 


Leu 


Glu 


Asp 


Leu 


Thr 


Ala 


Gly 


Glu 


Met 


Leu 


Phe 


Arg 


Ala 


Leu 


Arg 








20 










25 










30 






Lys 


His 


Ser 


His 


Leu 


Pro 


Gin 


Ala 


Leu 


Val 


Asp 


Val 


Val 


Gly Asp 


Glu 




35 










40 










45 








Ser 


Leu 


Ser 


Tyr 


Lys 


Glu 


Phe 


Phe 


Glu 


Ala 


Thr 


Val 


Leu 


Leu 


Ala 


Gin 




50 










55 










60 










Ser 


Leu 


His 


Asn 


Cys 


Gly 


Tyr 


Lys 


Met 


Asn Asp 


Val 


Val 


Ser 


He 


Cys 


65 










70 










75 










80 


Ala 


Glu 


Asn 


Asn 


Thr 


Arg 


Phe 


Phe 


He 


Pro 


Val 


He 


Ala 


Ala 


Trp 


Tyr 










85 










90 










95 




He 


Gly Met 


He 


Val 


Ala 


Pro 


Val 


Asn 


Glu 


Ser 


Tyr 


He 


Pro 


Asp 


Glu 








100 










105 










110 






Leu 


Cys 


Lys 


Val 


Met 


Gly 


He 


Ser 


Lys 


Pro 


Gin 


He 


Val 


Phe 


Thr 


Thr 






115 










12 0 










125 








Lys 


Asn 


He 


Leu 


Asn 


Lys 


Val 


Leu 


Glu 


Val 


Gin 


Ser 


Arg 


Thr 


Asn 


Phe 


130 










135 










140 










He 


Lys 


Arg 


He 


He 


He 


Leu 


Asp 


Thr 


Val 


Glu 


Asn 


He 


His 


Gly 


Cys 


145 






150 










155 










160 


Glu 


Ser 


Leu 


Pro 


Asn 


Phe 


He 


Ser 


Arg 


Tyr 


Ser 


Asp 


Gly Asn 


He 


Ala 










165 










170 










175 




Asn 


Phe 


Lys 


Pro 


Leu 


His 


Phe 


Asp 


Pro 


Val 


Glu 


Gin 


Val 


Ala 


Ala 


He 



29 



180 

Leu Cys Ser Ser 
195 

His Gin Asn lie 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val lie 
260 

Ala lie Gin Asp 
275 

lie Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

lie Pro Gly Val 
230 

Gly Phe His lie 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 
535 



185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 35 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 35 

acgccagccc aagcttaggc ctgagtggc 



<210> 36 



30 



<211> 44 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 36 

cttaattctc cccatccccc tgttgacaat taatcatcgg ctcg 

<210> 37 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 37 

tataatgtga ggaattgcga gcggataaca atttcacaca 

<210> 38 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 38 

atgggatgtt acctagacca atatgaaata tttggtaaat 

<210> 39 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 39 

aaatgcttaa tgaatttcaa aaaaaaaaaa aaaggaattc 

<210> 40 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 40 

gatatcaagc ttatcgatac cgtcgacctc gaggattata 

<210> 41 

<211> 37 

<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 41 

tagaaaaagg cctcggcggc cgctagttca gtcagtt 

<210> 42 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 42 

aactgactga actagcg 

<210> 43 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> An oligonucleotide 
<400> 43 

gccgccgagg cctttttcta tataatcctc gaggtcgacg 

<210> 44 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 44 

gtatcgataa gcttgatatc gaattccttt tttttttttt 

<210> 45 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 45 

agcttgatat cgaattcctt tttttttttt tttgaaattc 

<210> 46 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 46 

ttgaaattca ttaagcattt atttaccaaa tatttcatat 

<210> 47 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 47 

tggtctaggt aacatcccat cactagcttt tttttctata 

<210> 48 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 48 

tcgcaattcc tcacattata cgagccgatg attaattgtc 

<210> 49 
<211> 53 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 49 

aacaggggga tggggagaat taaggccact caggcctaag cttgggctgg cgt 
<210> 50 

<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 50 

ggaaacagga tcccatgatg aaacgcgaaa agaacgtgat 

<210> 51 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 51 

ctacggccca gaaccactgc atccactgga agacctcacc 
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<210> 52 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 52 

gctggtgaga tgctcttccg agcactgcgt aaacatagtc 

<210> 53 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 53 

acctccctca agcactcgtg gacgtcgtgg gagacgagag 

<210> 54 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 54 

cctctcctac aaagaatttt tcgaagctac tgtgctgttg 

<210> 55 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 55 

gcccaaagcc tccataattg tgggtacaaa atgaacgatg 

<210> 56 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 56 

tggtgagcat ttgtgctgag aataacactc gcttctttat 

<210> 57 
<211> 40 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 57 

tcctgtaatc gctgcttggt acatcggcat gattgtcgcc 

<210> 58 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 58 

cctgtgaatg aatcttacat cccagatgag ctgtgtaagg 

<210> 59 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 59 

ttatgggtat tagcaaacct caaatcgtct ttactaccaa 

<210> 60 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 60 

aaacatcttg aataaggtct tggaagtcca gtctcgtact 

<210> 61 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 61 

aacttcatca aacgcatcat tattctggat accgtcgaaa 

<210> 62 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> An oligonucleotide 
<400> 62 

acatccacgg ctgtgagagc ctccctaact tcatctctcg 

<210> 63 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 63 

ttacagcgat ggtaatatcg ctaatttcaa gcccttgcat 

<210> 64 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 64 

tttgatccag tcgagcaagt ggccgctatt ttgtgctcct 

<210> 65 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 65 

ccggcaccac tggtttgcct aaaggtgtca tgcagactca 

<210> 66 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 66 

ccagaatatc tgtgtgcgtt tgatccacgc tctcgaccct 

<210> 67 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 67 
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cgtgtgggta ctcaattgat ccctggcgtg actgtgctgg 



<210> 68 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 68 

tgtatctgcc tttctttcac gcctttggtt tctctattac 
<210> 69 

<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 69 

cctgggctat ttcatggtcg gcttgcgtgt catcatgttt 

<210> 70 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 70 

cgtcgcttcg accaagaagc cttcttgaag gctattcaag 

<210> 71 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 71 

actacgaggt gcgttccgtg atcaacgtcc cttcagtcat 

<210> 72 
<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 72 

tttgttcctg agcaaatctc ctttggttga caagtatgat ctg 
<210> 73 
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<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 73 

agcagcttgc gtgagctgtg ctgtggcgct gctcctt 

<210> 74 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 74 

tggccaaaga agtggccgag gtcgctgcta agcgtctgaa 

<210> 75 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 75 

cctccctggt atccgctgcg gttttggttt gactgagagc 

<210> 76 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 76 

acttctgcta acatccatag cttgcgagac gagtttaagt 

<210> 77 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 77 

ctggtagcct gggtcgcgtg actcctctta tggctgcaaa 

<210> 78 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 78 

gatcgccgac cgtgagaccg gcaaagcact gggcccaaat 
<210> 79 

<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 79 

caagtcggtg aattgtgtat taagggccct atggtctcta 

<210> 80 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 80 

aaggctacgt gaacaatgtg gaggccacta aagaagccat 

<210> 81 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 81 

tgatgatgat ggctggctcc atagcggcga cttcggttac 

<210> 82 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 82 

tatgatgagg ^cgaacactt ctatgtggtc gatcgctaca 

<210> 83 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 83 

aagaattgat taagtacaaa ggctctcaag tcgcaccagc 

<210> 84 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 84 

cgaactggaa gaaattttgc tgaagaaccc ttgtatccgc 

<210> 85 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 85 

gacgtggccg tcgtgggtat cccagacttg gaagctggcg 

<210> 86 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 86 

agttgcctag cgcctttgtg gtgaaacaac ccggcaagga 

<210> 87 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 87 

gatcactgct aaggaggtct acgactattt ggccgagcgc 

<210> 88 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 88 

gtgtctcaca ccaaatatct gcgtggcggc gtccgcttcg 



40 



<210> 89 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 89 

tcgattctat tccacgcaac gttaccggta agatcactcg 

<210> 90 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 90 

taaagagttg ctgaagcaac tcctcgaaaa agctggcggc 

<210> 91 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 91 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 92 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 92 

taatcatgaa gactttacta gccgccagct ttttcgagga 

<210> 93 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 93 

gttgcttcag caactcttta cgagtgatct taccggtaac 

<210> 94 
<211> 39 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 94 

gttgcgtgga atagaatcga cgaagcggac gccgccacg 

<210> 95 
<211> 41 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 95 

cagatatttg gtgtgagaca cgcgctcggc caaatagtcg t 

<210> 96 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 96 

agacctcctt agcagtgatc tccttgccgg gttgtttcac 

<210> 97 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 97 

cacaaaggcg ctaggcaact cgccagcttc caagtctggg 

<210> 98 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 98 

atacccacga cggccacgtc gcggatacaa gggttcttca 

<210> 99 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 
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<223> An oligonucleotide 
<400> 99 

gcaaaatttc ttccagttcg gctggtgcga cttgagagcc 

<210> 100 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 100 

tttgtactta atcaattctt tgtagcgatc gaccacatag 

<210> 101 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 101 

aagtgttcgt cctcatcata gtaaccgaag tcgccgctat 

<210> 102 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 102 

ggagccagcc atcatcatca atggcttctt tagtggcctc 

<210> 103 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 103 

cacattgttc acgtagcctt tagagaccat agggccctta 

<210> 104 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 104 
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atacacaatt caccgacttg atttgggccc agtgctttgc 



<210> 105 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 105 

cggtctcacg gtcggcgatc tttgcagcca taagaggagt 

<210> 106 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 106 

cacgcgaccc aggctaccag acttaaactc gtctcgcaag 

<210> 107 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 107 

ctatggatgt tagcagaagt gctctcagtc aaaccaaaac 

<210> 108 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 108 

cgcagcggat accagggagg ttcagacgct tagcagcgac 

<210> 109 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 109 

ctcggccact tctttggcca aaggagcagc gccacagcac 
<210> 110 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 110 

agctcacgca agctgctcag atcatacttg tcaaccaaag 

<210> 111 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 111 

gagatttgct caggaacaaa atgactgaag ggacgttgat 

<210> 112 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 112 

cacggaacgc acctcgtagt cttgaatagc cttcaa 

<210> 113 

<211> 44 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 113 

gaaggcttct tggtcgaagc gacgaaacat gatgacacgc aagc 

<210> 114 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 114 

cgaccatgaa atagcccagg gtaatagaga aaccaaaggc 

<210> 115 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 115 

gtgaaagaaa ggcagataca ccagcacagt cacgccaggg 

<210> 116 

<2ll> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 116 

atcaattgag tacccacacg agggtcgaga gcgtggatca 

<210> 117 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 117 

aacgcacaca gatattctgg tgagtctgca tgacaccttt 

<210> 118 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 118 

aggcaaacca gtggtgccgg aggagcacaa aatagcggcc 

<210> 119 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 119 

acttgctcga ctggatcaaa atgcaagggc ttgaaattag 

<210> 120 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 120 

cgatattacc atcgctgtaa cgagagatga agttagggag 

<210> 121 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 121 

gctctcacag ccgtggatgt tttcgacggt atccagaata 

<210> 122 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 122 

atgatgcgtt tgatgaagtt agtacgagac tggacttcca 

<210> 123 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 123 

agaccttatt caagatgttt ttggtagtaa agacgatttg 

<210> 124 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 124 

aggtttgcta atacccataa ccttacacag ctcatctggg 

<210> 125 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 125 

atgtaagatt cattcacagg ggcgacaatc atgccgatgt 
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<210> 126 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 126 

accaagcagc gattacagga ataaagaagc gagtgttatt 

<210> 127 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 127 

ctcagcacaa atgctcacca catcgttcat tttgtaccca 

<210> 128 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 128 

caattatgga ggctttgggc caacagcaca gtagcttcga 

<210> 129 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 129 

aaaattcttt gtaggagagg ctctcgtctc ccacgacgtc 

<210> 130 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 130 

cacgagtgct tgagggaggt gactatgttt acgcagtgct 

<210> 131 
<211> 40 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 131 

cggaagagca tctcaccagc ggtgaggtct tccagtggat 

<210> 132 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 132 

gcagtggttc tgggccgtag atcacgttct tttcgcgttt 

<210> 133 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 133 

catcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

<210> 134 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 134 

ggaaacagga tcccatgatg aagcgtgaga aaaatgtcat 

<210> 135 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 135 

ctatggccct gagcctctcc atcctttgga ggatttgact 

<210> 136 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 



49 



<223> An oligonucleotide 



<400> 136 

gccggcgaaa tgctgtttcg tgctctccgc aagcactctc 

<210> 137 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 137 

atttgcctca agccttggtc gatgtggtcg gcgatgaatc 

<210> 138 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 138 

tttgagctac aaggagtttt ttgaggcaac cgtcttgctg 

<210> 139 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> An oligonucleotide 
<400> 139 

gctcagtccc tccacaattg tggctacaag atgaacgacg 

<210> 140 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 140 

tcgttagtat ctgtgctgaa aacaataccc gtttcttcat 

<210> 141 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 141 
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tccagtcatc gccgcatggt atatcggtat gatcgtggct 



<210> 142 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 142 

ccagtcaacg agagctacat tcccgacgaa ctgtgtaaag 

<210> 143 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 143 

tcatgggtat ctctaagcca cagattgtct tcaccactaa 

<210> 144 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 144 

gaatattctg aacaaagtcc tggaagtcca aagccgcacc 

<210> 145 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 145 

aactttatta agcgtatcat catcttggac actgtggaga 

<210> 146 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 146 

atattcacgg ttgcgaatct ttgcctaatt tcatctctcg 
<210> 147 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 147 

ctattcagac ggcaacatcg caaactttaa accactccac 

<210> 148 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 148 

ttcgaccctg tggaacaagt tgcagccatt ctgtgtagca 

<210> 149 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 149 

gcggtactac tggactccca aagggagtca tgcagaccca 

<210> 150 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 150 

tcaaaacatt tgcgtgcgtc tgatccatgc tctcgatcca 

<210> 151 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 151 

cgctacggca ctcagctgat tcctggtgtc accgtcttgg 

<210> 152 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 152 

tctacttgcc tttcttccat gctttcggct ttcatattac 

<210> 153 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 153 

tttgggttac tttatggtcg gtctccgcgt gattatgttc 

<210> 154 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 154 

cgccgttttg atcaggaggc tttcttgaaa gccatccaag 

<210> 155 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 155 

attatgaagt ccgcagtgtc atcaacgtgc ctagcgtgat 

<210> 156 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 156 

cctgtttttg tctaagagcc cactcgtgga caagtacgac 

<210> 157 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 157 

ttgtcttcac tgcgtgaatt gtgttgcggt gccgctccac 

<210> 158 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 158 

tggctaagga ggtcgctgaa gtggccgcca aacgcttgaa 

<210> 159 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 159 

tcttccaggg attcgttgtg gcttcggcct caccgaatct 

<210> 160 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 160 

accagcgcta ttattcagtc tctccgcgat gagtttaaga 

<210> 161 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 161 

gcggctcttt gggccgtgtc actccactca tggctgctaa 

<210> 162 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 162 

gatcgctgat cgcgaaactg gtaaggcttt gggccctaac 
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<210> 163 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 163 

caagtgggcg agctgtgtat caaaggccct atggtgagca 

<210> 164 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 164 

agggttatgt caataacgtc gaagctacca aggaggccat 

<210> 165 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 165 

cgacgacgac ggctggttgc attctggtga ttttggatat 

<210> 166 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 166 

tacgacgaag atgagcattt ttacgtcgtg gatcgttaca 

<210> 167 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 167 

aggagctgat caaatacaag ggtagccagg ttgctccagc 

<210> 168 
<211> 40 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 168 

tgagttggag gagattctgt tgaaaaatcc atgcattcgc 

<210> 169 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 169 

gatgtcgctg tggtcggcat tcctgatctg gaggccggcg 

<210> 170 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 170 

aactgccttc tgctttcgtt gtcaagcagc ctggtaaaga 

<210> 171 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 171 

aattaccgcc aaagaagtgt atgattacct ggctgaacgt 

<210> 172 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 172 

gtgagccata ctaagtactt gcgtggcggc gtgcgttttg 

<210> 173 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> An oligonucleotide 



<400> 173 

ttgactccat ccctcgtaac gtaacaggca aaattacccg 

<210> 174 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 174 

caaggagctg ttgaaacaat tgttggagaa ggccggcggt 

<210> 175 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 175 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 176 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 176 

taatcatgaa gactttacta accgccggcc ttctccaaca 

<210> 177 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 177 

attgtttcaa cagctccttg cgggtaattt tgcctgttac 

<210> 178 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 178 
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gttacgaggg atggagtcaa caaaacgcac gccgccacgc 



<210> 179 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 179 

aagtacttag tatggctcac acgttcagcc aggtaatcat 

<210> 180 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 180 

acacttcttt ggcggtaatt tctttaccag gctgcttgac 

<210> 181 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 181 

aacgaaagca gaaggcagtt cgccggcctc cagatcagga 

<210> 182 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 182 

atgccgacca cagcgacatc gcgaatgcat ggatttttca 

<210> 183 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 183 

acagaatctc ctccaactca gctggagcaa cctggctacc 
<210> 184 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 184 

cttgtatttg atcagctcct tgtaacgatc cacgacgtaa 

<210> 185 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 185 

aaatgctcat cttcgtcgta atatccaaaa tcaccagaat 

<210> 186 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 186 

gcaaccagcc gtcgtcgtcg atggcctcct tggtagcttc 

<210> 187 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 187 

gacgttattg acataaccct tgctcaccat agggcctttg 

<210> 188 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 188 

atacacagct cgcccacttg gttagggccc aaagccttac 

<210> 189 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 189 

cagtttcgcg atcagcgatc ttagcagcca tgagtggagt 

<210> 190 
<2ll> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 190 

gacacggccc aaagagccgc tcttaaactc atcgcggaga 

<210> 191 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 191 

gactgaataa tagcgctggt agattcggtg aggccga 

<210> 192 

<211> 43 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 192 

agccacaacg aatccctgga agattcaagc gtttggcggc 

<210> 193 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 193 

ttcagcgacc tccttagcca gtggagcggc accgcaacac 

<210> 194 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 194 

aattcacgca gtgaagacaa gtcgtacttg tccacgagtg 

<210> 195 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 195 

ggctcttaga caaaaacagg atcacgctag gcacgttgat 

<210> 196 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 196 

gacactgcgg acttcataat cttggatggc tttcaagaaa 

<210> 197 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 197 

gcctcctgat caaaacggcg gaacataatc acgcggagac 

<210> 198 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 198 

cgaccataaa gtaacccaaa gtaatatgaa agccgaaagc 

<210> 199 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 199 

atggaagaaa ggcaagtaga ccaagacggt gacaccagga 
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<210> 200 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 200 

atcagctgag tgccgtagcg tggatcgaga gcatggatca 

<210> 201 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 201 

gacgcacgca aatgttttga tgggtctgca tgactccctt 

<210> 202 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 202 

tgggagtcca gtagtaccgc tgctacacag aatggctgca 

<210> 203 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 203 

acttgttcca cagggtcgaa gtggagtggt ttaaagtttg 

<210> 204 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 204 

cgatgttgcc gtctgaatag cgagagatga aattaggcaa 

<210> 205 
<211> 40 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 205 

agattcgcaa ccgtgaatat tctccacagt gtccaagatg 

<210> 206 
<211> 40 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 206 

atgatacgct taataaagtt ggtgcggctt tggacttcca 

<210> 207 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 207 

ggactttgtt cagaatattc ttagtggtga agacaatctg 

<210> 208 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 208 

tggcttagag atacccatga ctttacacag ttcgtcggga 

<210> 209 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 209 

atgtagctct cgttgactgg agccacgatc ataccgatat 

<210> 210 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> An oligonucleotide 
<400> 210 

accatgcggc gatgactgga atgaagaaac gggtattgtt 

<210> 211 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 211 

ttcagcacag atactaacga cgtcgttcat cttgtagcca 

<210> 212 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 212 

caattgtgga gggactgagc cagcaagacg gttgcctcaa 

<210> 213 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 213 

aaaactcctt gtagctcaaa gattcatcgc cgaccacatc 

<210> 214 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 214 

gaccaaggct tgaggcaaat gagagtgctt gcggagagca 

<210> 215 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 215 
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cgaaacagca tttcgccggc agtcaaatcc tccaaaggat 



40 



<210> 216 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 216 

ggagaggctc agggccatag atgacatttt tctcacgctt 

<210> 217 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 217 

catcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

<210> 218 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 



<400> 218 



Met 


Met 


Lys 


Arg 


Glu 


Lys 


Asn 


Val 


He 


Tyr Gly 


Pro 


Glu 


Pro 


Leu 


His 


1 








5 










10 










15 




Pro 


Leu 


Glu 


Asp 


Leu 


Thr 


Ala 


Gly 


Glu 


Met 


Leu 


Phe 


Arg 


Ala 


Leu Arg 








20 










25 










30 






Lys 


His 


Ser 


His 


Leu 


Pro 


Gin 


Ala 


Leu 


Val 


Asp 


Val 


Val 


Gly Asp 


Glu 




35 










40 










45 








Ser 


Leu 
50 


Ser 


Tyr 


Lys 


Glu 


Phe 
55 


Phe 


Glu 


Ala 


Thr 


Val 
60 


Leu 


Leu 


Ala 


Gin 


Ser 


Leu 


His 


Asn 


Cys 


Gly 


Tyr 


Lys 


Met 


Asn 


Asp 


Val 


Val 


Ser 


He 


Cys 


65 










70 










75 










80 


Ala 


Glu 


Asn 


Asn 


Thr 
85 


Arg 


Phe 


Phe 


He 


Pro 
90 


Val 


He 


Ala 


Ala 


Trp 
95 


Tyr 


He 


Gly 


Met 


He 


Val 


Ala 


Pro 


Val 


Asn 


Glu 


Ser 


Tyr 


He 


Pro 


Asp 


Glu 






100 










105 










110 






Leu 


Cys 


Lys 
115 


Val 


Met 


Gly 


He 


Ser 
12 0 


Lys 


Pro 


Gin 


He 


Val 
125 


Phe 


Thr 


Thr 


Lys 


Asn 
130 


He 


Leu 


Asn 


Lys 


Val 
135 


Leu 


Glu 


Val 


Gin 


Ser 
140 


Arg 


Thr 


Asn 


Phe 


He 


Lys 


Arg 


He 


He 


He 


Leu 


Asp 


Thr 


Val 


Glu 


Asn 


He 


His 


Gly Cys 


145 








150 










155 










160 


Glu 


Ser 


Leu 


Pro 


Asn 


Phe 


He 


Ser 


Arg 


Tyr 


Ser 


Asp 


Gly Asn 


He 


Ala 










165 










170 










175 




Asn 


Phe 


Lys 


Pro 


Leu 


His 


Phe 


Asp 


Pro 


Val 


Glu 


Gin 


Val 


Ala 


Ala 


He 






180 










185 










190 






Leu 


Cys 


Ser 


Ser 


Gly 


Thr 


Thr Gly 


Leu 


Pro 


Lys 


Gly Val 


Met 


Gin 


Thr 
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195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 
535 



He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 219 
<211> 542 
<212> PRT 
<213> Artificial 



Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 



<400> 219 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
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35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 
Asp Tyr Leu Ala 



40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 



Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 



45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
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500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



<210> 220 
<211> 542 
<212> PRT 
<213> Artificial 



Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



<400> 220 
Met Met Lys Arg 
1 

Pro Leu Glu Asp 
20 

Lys His Ser Tyr 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 
Phe Gly Leu Thr 



Glu Lys Asn Val 
5 

Leu Thr Ala Gly 

Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 



He Tyr Gly Pro 
10 

Glu Met Leu Phe 
25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 



Glu Pro Leu His 
15 

Arg Ala Leu Arg 
30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
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340 










345 










3 50 






Glu 


Phe 


Lys 


Ser 


Gly 


Ser Leu Gly 


Arg 


Val 


Thr 


Pro 


Leu 


Met 


Ala 


Ala 






355 










360 










365 








Lys 


He 


Ala Asp Arg 


Glu Thr Gly 


Lys Ala Leu Gly 


Pro 


Asn 




Val 




370 










375 










380 










Gly Glu 


Leu 


Cys 


He 


Lys 


Gly 


Pro 


Met 


Val 


Ser 


Lys 


Gly Tyr 


Val 


Asn 


385 










390 










3 95 










A fi n 
^ u u 


Asn 


Val 


Glu 


Ala 


Thr 


Lys 


Glu 


Ala 


He 


Asp 


Asp 


Asp 


Gly Trp 


Leu 


His 










405 










410 










*± -L J 




Ser Gly 


Asp 


Phe 


Gly 


Tyr 


Tyr 


Asp 


Glu 


Asp 


Glu 


His 


Phe 


Tyr 


Val 


Val 








420 










425 










430 






Asp 


Arg 


Tyr 
435 


Lys 


Glu 


Leu 


He 


Lys 
440 


Tyr 


Lys 


Gly 


Ser 


Gin 
445 


Val 


Ala 


Pro 


Ala 


Glu 
450 


Leu 


Glu 


Glu 


He 


Leu 
455 


Leu 


Lys 


Asn 


Pro 


Cys 
460 


He 


Arg 


Asp 


Val 


Ala 


Val 


Val 


Gly 


He 


Pro 


Asp 


Leu 


Glu Ala Gly Glu 


Leu 


Pro 


Ser 


Ala 


465 










470 










475 










480 


Phe 


Val 


Val 


Lys 


Gin 


Pro 


Gly Lys 


Glu 


He 


Thr 


Ala 


Lys 


Glu 


Val 


Tyr 








485 










490 










495 




Asp 


Tyr 


Leu 


Ala 


Glu 


Arg 


Val 


Ser 


His 


Thr 


Lys 


Tyr 


Leu 


Arg 


Gly Gly 






500 










505 










510 






Val 


Arg 


Phe 


Val 


Asp 


Ser 


He 


Pro 


Arg 


Asn 


Val 


Thr Gly Lys 


He 


Thr 






515 










520 










525 








Arg 


Lys 
530 


Glu 


Leu 


Leu 


Lys 


Gin 
535 


Leu 


Leu 


Glu 


Lys 


Ala 
540 


Gly 


Gly 







<210> 221 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 



<400> 221 



Met 


Met 


Lys 


Arg 


Glu 


Lys 


Asn 


Val 


He 


Tyr 


Gly 


Pro 


Glu 


Pro 


Leu 


His 


1 








5 










10 










15 




Pro 


Leu 


Glu 


Asp 
20 


Leu 


Thr 


Ala 


Gly 


Glu 
25 


Met 


Leu 


Phe 


Arg 


Ala 
30 


Leu 


Arg 


Lys 


His 


Ser 


His 


Leu 


Pro 


Gin 


Ala 


Leu 


Val 


Asp 


Val 


Val 


Gly Asp 


Glu 




35 










40 










45 








Ser 


Leu 
50 


Ser 


Tyr 


Lys 


Glu 


Phe 
55 


Phe 


Glu 


Ala 


Thr 


Val 
60 


Leu 


Leu 


Ala 


Gin 


Ser 


Leu 


His 


Asn 


Cys 


Gly 


Tyr 


Lys 


Met 


Asn Asp 


Val 


Val 


Ser 


He 


Cys 


65 










70 










75 










80 


Ala 


Glu 


Asn 


Asn 


Thr 
85 


Arg 


Phe 


Phe 


He 


Pro 
90 


Val 


He 


Ala 


Ala 


Trp 
95 


Tyr 


He 


Gly Met 


He 


Val 


Ala 


Pro 


Val 


Asn 


Glu 


Ser 


Tyr 


He 


Pro 


Asp 


Glu 








100 










105 










110 






Leu 


Cys 


Lys 
115 


Val 


Met 


Gly 


He 


Ser 
120 


Lys 


Pro 


Gin 


He 


Val 
125 


Phe 


Thr 


Thr 


Lys 


Asn 


He 


Leu 


Asn 


Lys 


Val 


Leu 


Glu 


Val 


Gin 


Ser 


Arg 


Thr 


Asn 


Phe 


130 










135 










140 










He 


Lys 


Arg 


He 


He 


He 


Leu 


Asp 


Thr 


Val 


Glu 


Asn 


He 


His 


Gly Cys 


145 








150 










155 










160 


Glu 


Ser 


Leu 


Pro 


Asn 


Phe 


He 


Ser 


Arg 


Tyr 


Ser 


Asp 


Gly Asn 


He 


Ala 










165 










170 










175 




Asn 


Phe 


Lys 


Pro 


Leu 


His 


Phe 


Asp 


Pro 


Val 


Glu 


Gin 


Val 


Ala 


Ala 


He 
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180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 222 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 222 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
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20 

Lys His Ser His 
35 

Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

He Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 



Leu Pro Gin Ala 
40 

Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 



25 

Leu Val Asp Val 

Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 



30 

Val Gly Asp Glu 
45 

Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 

270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Gly Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
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485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 223 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 223 

Met He Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
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Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Thr 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Val Lys Ala 
540 



335 

Thr Leu Gly Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 224 
<211> 311 
<212> PRT 
<213> Renilla 



renif ormis 



<400> 224 
























Met 


Thr 


Ser 


Lys 


Val 


Tyr 


Asp 


Pro Glu Gin 


Arg 


Lys 


Arg 


Met 


He 


Thr 


1 








5 






10 










15 




Gly 


Pro 


Gin 


Trp 


Trp 


Ala 


Arg 


Cys Lys Gin 


Met 


Asn 


Val 


Leu Asp 


Ser 






20 








25 








30 






Phe 


He 


Asn 


Tyr 


Tyr 


Asp 


Ser 


Glu Lys His 


Ala 


Glu 


Asn 


Ala 


Val 


He 






35 










40 






45 








Phe 


Leu 


His 


Gly 


Asn 


Ala 


Ala 


Ser Ser Tyr 


Leu 


Trp 


Arg 


His 


Val 


Val 




50 










55 






60 










Pro 


His 


He 


Glu 


Pro 


Val 


Ala 


Arg Cys He 


He 


Pro 


Asp 


Leu 


He 


Gly 


65 










70 






75 










80 


Met 


Gly Lys 


Ser 


Gly 


Lys 


Ser Gly Asn Gly 


Ser 


Tyr 


Arg 


Leu 


Leu 


Asp 










85 






90 










95 




His 


Tyr 


Lys 


Tyr 


Leu 


Thr 


Ala 


Trp Phe Glu 


Leu 


Leu 


Asn 


Leu 


Pro 


Lys 








100 








105 








110 






Lys 


He 


He 


Phe 


Val 


Gly His Asp Trp Gly Ala Cys 


Leu 


Ala 


Phe 


His 




115 










120 






125 








Tyr 


Ser 


Tyr 


Glu 


His 


Gin 


Asp 


Lys He Lys 


Ala 


He 


Val 


His 


Ala 


Glu 




130 










135 






140 










Ser 


Val 


Val 


Asp 


Val 


He 


Glu 


Ser Trp Asp 


Glu 


Trp 


Pro 


Asp 


He 


Glu 


145 










150 






155 










160 


Glu 


Asp 


He 


Ala 


Leu 


He 


Lys 


Ser Glu Glu Gly Glu Lys 


Met 


Val 


Leu 








165 






170 










175 




Glu 


Asn 


Asn 


Phe 


Phe 


Val 


Glu 


Thr Met Leu 


Pro 


Ser 


Lys 


He 


Met 


Arg 



180 185 190 
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Lys 


Lsil 




Pro 


Glu 


Glu 


Phe 


Ala 


Ala 


Tvr 
J- ji j- 


Leu 


Glu 


Pro 


Phe 


Lys 


Glu 




1 y D 










^ \J \J 










205 








Lys 


\j±Y 
210 


Glu 


Val 


Arg 




Jtr J- \J 
Z ±3 


Thr 


Leu 


Ser 


TrD 


Pro 
220 


Arg 


Glu 


He 


Pro 


Leu 


Val 


Lys 


Gly Gly 


Lys 






Val 


Val 


Gin 


lie 


Val 


Arg Asn 


Tyr 


225 










o Q ri 










2 3 5 










240 


Asn 


Ala 


Tyr 


Leu 


Arg 


Ala 


Ser 


Asp 


Asp 


Leu 


Pro 


Lys 


Met 


Phe 




Glu 








245 










Zi ^ w 










255 




Ser 


Asp 


Pro 


Gly 


Phe 


Ir lit; 


OCX 


Asn 


Ala 


lie 


Val 


Glu 


Gly Ala 


Lys 


Lys 






260 










265 










270 








Pro 


Asn 


Thr 


Glu 


Phe 


Val 


Lys 
280 


Val 


Lys 


Glv 


Leu 


His 
285 


Phe 


Ser 


Gin 


Glu 


Asp 


Ala 


Pro 


Asp 


Glu 


Met 


Gly Lys 


Tyr 


He 


Lys 


Ser 


Phe 


Val 


Glu 




290 










295 










300 










Arg 


Val 


Leu 


Lys 


Asn 


Glu 


Gin 




















305 










310 























<210> 225 
<211> 311 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 225 

Met Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 

15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe He Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val He 

35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 

85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 

115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 

165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 

195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 
245 250 255 
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Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 

275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

<210> 226 
<211> 311 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 226 

Met Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 

15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe He Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val He 

35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 

85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 

115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 

165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 

195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 

245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 

275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 



75 



<210> 227 
<211> 311 
<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 



<400> 227 
Met Ala Ser Lys 
1 

Gly Pro Gin Trp 
20 

Phe lie Asn Tyr 
35 

Phe Leu His Gly 
50 

Pro His lie Glu 
65 

Met Gly Lys Ser 

His Tyr Lys Tyr 
100 

Lys lie lie Phe 
115 

Tyr Ser Tyr Glu 
130 

Ser Val Val Asp 
145 

Glu Asp lie Ala 

Glu Asn Asn Phe 
180 

Lys Leu Glu Pro 
195 

Lys Gly Glu Val 
210 

Leu Val Lys Gly 
225 

Asn Ala Tyr Leu 

Ser Asp Pro Gly 
260 

Phe Pro Asn Thr 
275 

Glu Asp Ala Pro 
290 

Arg Val Leu Lys 
305 



Val Tyr Asp Pro 
5 

Trp Ala Arg Cys 

Tyr Asp Ser Glu 
40 

Asn Ala Ala Ser 
55 

Pro Val Ala Arg 
70 

Gly Lys Ser Gly 
85 

Leu Thr Ala Trp 

Val Gly His Asp 
120 

His Gin Asp Lys 
135 

Val lie Glu Ser 
150 

Leu lie Lys Ser 
165 

Phe Val Glu Thr 

Glu Glu Phe Ala 
200 

Arg Arg Pro Thr 
215 

Gly Lys Pro Asp 
230 

Arg Ala Ser Asp 
245 

Phe Phe Ser Asn 

Glu Phe Val Lys 
280 

Asp Glu Met Gly 
295 

Asn Glu Gin 
310 



Glu Gin Arg Lys 
10 

Lys Gin Met Asn 
25 

Lys His Ala Glu 

Ser Tyr Leu Trp 
60 

Cys lie lie Pro 
75 

Asn Gly Ser Tyr 
90 

Phe Glu Leu Leu 
105 

Trp Gly Ala Cys 

lie Lys Ala lie 
140 

Trp Asp Glu Trp 
155 

Glu Glu Gly Glu 
170 

Met Leu Pro Ser 
185 

Ala Tyr Leu Glu 

Leu Ser Trp Pro 
220 

Val Val Gin He 
235 

Asp Leu Pro Lys 
250 

Ala He Val Glu 
265 

Val Lys Gly Leu 

Lys Tyr He Lys 
300 



Arg Met He Thr 
15 

Val Leu Asp Ser 
30 

Asn Ala Val He 
45 

Arg His Val Val 

Asp Leu He Gly 
80 

Arg Leu Leu Asp 
95 

Asn Leu Pro Lys 
110 

Leu Ala Phe His 
125 

Val His Ala Glu 

Pro Asp He Glu 
160 

Lys Met Val Leu 
175 

Lys He Met Arg 
190 

Pro Phe Lys Glu 
205 

Arg Glu He Pro 

Val Arg Asn Tyr 
240 

Met Phe He Glu 
255 

Gly Ala Lys Lys 
270 

His Phe Ser Gin 
285 

Ser Phe Val Glu 



<210> 228 
<211> 14 
<212> DNA 
<213> Artificial 



Sequence 



<220> 

<223> A consensus sequence 
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<221> misc__f eature 
<222> (1) . . . (14) 
<223> n = A;T,C or G 

<400> 228 
yggmnnnnng ccaa 

<210> 229 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 229 

gtactgagac gacgccagcc caagcttagg cctgagtg 

<210> 230 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 230 

ggcatgagcg tgaactgact gaactagcgg ccgccgag 

<210> 231 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 231 

ggatcccatg gtgaagcgtg agaa 

<210> 232 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 232 

ggatcccatg gtgaaacgcg a 

<210> 233 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 



<400> 233 

ctagcttttt tttctagata atcatgaaga c 

<210> 234 
<211> 54 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 234 

caaaaagctt ggcattccgg tactgttggt aaagccacca tggtgaagcg agag 

<210> 235 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 235 

caattgttgt tgttaacttg tttatt 

<210> 236 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 236 

aaccatggct tccaaggtgt acgaccccga gcaacgcaaa 

<210> 237 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 237 

gctctagaat tactgctcgt tcttcagcac gcgctccacg 

<210> 238 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 238 

cgctagccat ggcttcgaaa gtttatgatc c 



78 



<210> 239 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 239 

ggccagtaac tctagaatta ttgtt 

<210> 240 
<211> 5 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 240 
tataa 

<210> 241 
<211> 6 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 241 
stratg 

<210> 242 
<211> 9 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<221> misc_f eature 
<222> (1) . . . (9) 
<223> n = A,T,C or G 

<400> 242 
mttncnnnna 

<210> 243 
<211> 5 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 243 
tratg 



<210> 244 
<211> 7 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A consensus sequence 
<400> 244 

tgastma 7 

<210> 245 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> A consensus sequence 

<221> misc_f eature 
<222> (1) . . . (14) 
<223> n = A,T,C or G 

<400> 245 

yggmnnnnng ccaa 14 

<210> 246 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 246 

aaccatggct tccaaggtgt acgaccccga gcaacgcaaa 4 0 

<210> 247 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 247 

cgcatgatca ctgggcctca gtggtgggct cgctgcaagc 40 

<210> 248 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 248 

aaatgaacgt gctggactcc ttcatcaact actatgattc 4 0 
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<210> 249 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 249 

cgagaagcac gccgagaacg ccgtgatttt tctgcatggt aacgctg^ 

<210> 250 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 250 

ccagctacct gtggaggcac gtcgtgcctc acatcgagcc 

<210> 251 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 251 

cgtggctaga tgcatcatcc ctgatctgat cggaatgggt 

<210> 252 
<2ll> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 252 

aagtccggca agagcgggaa tggctcatat cgcctcctgg 

<210> 253 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 253 

atcactacaa gtacctcacc gcttggttcg agctgctgaa 

<210> 254 
<211> 40 
<212> DNA 
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<213> Artificial Sequence 



<220> 

<223> An oligonucleotide 
<400> 254 

ccttccaaag aaaatcatct ttgtgggcca cgactggggg 

<210> 255 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 255 

gcttgtctgg cctttcacta ctcctacgag caccaagaca 

<210> 256 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> An oligonucleotide 
<400> 256 

agatcaaggc catcgtccat gctgagagtg tcgtggacgt 

<210> 257 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 257 

gatcgagtcc tgggacgagt ggcctgacat cgaggaggat atcgc 

<210> 258 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 258 

cctgatcaag agcgaagagg gcgagaaaat ggtgcttgag 

<210> 259 

<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> An oligonucleotide 



<400> 259 

aataacttct tcgtcgagac catgctccca agcaagatca 

<210> 260 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 260 

tgcggaaact ggagcctgag gagttcgctg cctacctgga gc 

<210> 261 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 261 

tcaaggagaa gggcgaggtt agacggccta ccctctcctg 

<210> 262 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 262 

gcctcgcgag atccctctcg ttaagggagg caagcccgac 

<210> 263 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 263 

gtcgtccaga ttgtccgcaa ctacaacgcc taccttcggg 

<210> 264 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 264 
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ccagcgacga tctgcctaag atgttcatcg agtccgaccc 



<210> 265 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 265 

tgggttcttt tccaacgcta ttgtcgaggg agctaagaag 

<210> 266 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 266 

ttccctaaca ccgagttcgt gaaggtgaag ggcctccact 

<210> 267 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 267 

tcagccagga ggacgctcca gatgaaatgg gtaagtacat 

<210> 268 
<211> 49 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 268 

caagagcttc gtggagcgcg tgctgaagaa cgagcagtaa ttctagagc 

<210> 269 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 269 

gctctagaat tactgctcgt tcttcagca 
<210> 270 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 270 

cgcgctccac gaagctcttg atgtacttac ccatttcatc 

<210> 271 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 271 

tggagcgtcc tcctggctga agtggaggcc cttcaccttc 

<210> 272 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 272 

acgaactcgg tgttagggaa cttcttagct ccctcgacaa 

<210> 273 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 273 

tagcgttgga aaagaaccca gggtcggact cgatgaacat 

<210> 274 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 274 

cttaggcaga tcgtcgctgg cccgaaggta ggcgttgtag 

<210> 275 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 275 

ttgcggacaa tctggacgac gtcgggcttg cctcccttaa 

<210> 276 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 276 

cgagagggat ctcgcgaggc caggagaggg taggccgtct 

<210> 277 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 277 

aacctcgccc ttctccttga atggctccag gtaggcagcg 

<210> 278 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 278 

aactcctcag gctccagttt ccgcatgatc ttgcttggga gcatg 

<210> 279 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 279 

gtctcgacga agaagttatt ctcaagcacc attttctcgc 

<210> 280 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
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<400> 280 

cctcttcgct cttgatcagg gcgatatcct cctcgatgtc 

<210> 281 
<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 281 

aggccactcg tcccaggact cgatcacgtc cacgacactc tea 

<210> 282 
<211> 42 
<212> WA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 282 

gcatggacga tggccttgat cttgtcttgg tgctcgtagg ag 

<210> 283 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 283 

tagtgaaagg ccagacaagc cccccagtcg tggcccacaa 

<210> 284 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 284 

agatgatttt ctttggaagg ttcagcagct cgaaccaagc 

<210> 285 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 285 

ggtgaggtac ttgtagtgat ccaggaggcg atatgagcca 
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<210> 286 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 286 

ttcccgctct tgccggactt acccattccg atcagatcag 

<210> 287 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 287 

ggatgatgca tctagccacg ggctcgatgt gaggcacgac gtgcc 

<210> 288 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 288 

tccacaggta gctggaggca gcgttaccat gcagaaaaat 

<210> 289 
<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 289 

cacggcgttc tcggcgtgct tctcggaatc atagtagttg atgaa 

<210> 290 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 290 

ggagtccagc acgttcattt gcttgcagcg agcccaccac 

<210> 291 
<211> 40 
<212> DNA 



<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 291 

tgaggcccag tgatcatgcg tttgcgttgc tcggggtcgt 

<210> 292 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 
<400> 292 

acaccttgga agccatggtt 

<210> 293 
<211> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A Kozak sequence 

<400> 293 
aaccatggct 

<210> 294 
<211> 12 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> An oligonucleotide 

<400> 294 
taattctaga gc 

<210> 295 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> A primer 
<400> 295 

gcgtagccat ggtaaagcgt gagaaaaatg tc 

<210> 296 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> A primer 



840 
900 



<400> 296 

ccgactctag attactaacc gccggccttc acc 33 

<210> 297 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 297 

atggtgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 12 0 

ctcgtggacg tcgtgggaga cgagagcctc tcctacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 240 

gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 3 00 

gtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 3 60 

aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 420 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 480 

gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 540 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 720 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 
atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 
tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 

tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 114 0 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 12 0 0 

aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 132 0 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 1380 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 144 0 

tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 1500 

gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 15 60 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 162 0 
ggcggc 

<210> 298 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 298 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 



1626 
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Ser Leu Ser Tyr 
50 

Ser Leu His Asn 
65 

Ala Glu Asn Asn 

He Gly Met He 
100 

Leu Cys Lys Val 
115 

Lys Asn He Leu 
130 

lie Lys Arg He 
145 

Glu Ser Leu Pro 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

His Gin Asn He 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

Leu Arg Val He 
260 

Ala He Gin Asp 
275 

He Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

Lys He Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 



Lys Glu Phe Phe 
55 

Cys Gly Tyr Lys 
70 

Thr Arg Phe Phe 
85 

Val Ala Pro Val 

Met Gly He Ser 
120 

Asn Lys Val Leu 
135 

He He Leu Asp 
150 

Asn Phe He Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe Ser He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 



Glu Ala Thr Val 
60 

Met Asn Asp Val 
75 

He Pro Val He 
90 

Asn Glu Ser Tyr 
105 

Lys Pro Gin He 

Glu Val Gin Ser 
140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala Asn He His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 



Leu Leu Ala Gin 

Val Ser He Cys 
80 

Ala Ala Trp Tyr 
95 

He Pro Asp Glu 
110 

Val Phe Thr Thr 
125 

Arg Thr Asn Phe 

He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Val 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 
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540 
600 



Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 299 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic lucif erase 
<400> 299 

atggtgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 3 00 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 60 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 
gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 
ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 720 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 84 0 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 00 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 1380 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 
ggcggt 

<210> 300 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 300 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 
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Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 
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Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 301 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 301 

atggtaaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 3 00 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 3 60 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 4 80 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 84 0 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gtgcgattat ccagactctc ggggatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 12 0 0 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 132 0 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 144 0 

ttcgttgtca agcagcctgg tacagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggtgaaggcc 162 0 

ggcggt 162 6 

<210> 302 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Sequence of a synthetic luciferase 
<400> 302 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 
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Leu 


Ser 


Tyr 


Lys 


VjIU 


T)Vt tfi 

Fne 


Fne 


r^l 11 
olU 


Aia 


Thr 


vai 


Leu 


Leu 




m n 




50 










55 










60 










Ser 


Leu 


TT -I « 

His 


Asn 


Cys 


Cjiy 


Tyr 


Lys 


jyieu 


Asn 


Asp 


vai 


vai 


Ser 


He 


Cys 


65 










70 










75 










O A 


TV 1 

Ala 


Glu 


Asn 


Asn 


Thr 


Arg 


Pne 


Phe 


T 1 

He 


Pro 


val 


T 1 i=i 

lie 


A 1 

Ala 


Ala 


Trp 


Tyr 










85 










90 










95 




lie 


Gly 


Met 


lie 


vai 


Ala 


Pro 


vai 


Asn 


Pin 

LjIU 


Ser 


Tyr 


He 


Pro Asp 


olU 








100 










1 A cr 
10b 










1 1 u 






Leu 


Cys 


Lys 


vai 


Hec 


oiy 


Tl 

lie 


Ser 


Lys 


Pro 


Pl n 


He 


Val 


Fne 


Thr 


Thr 






lib 




















125 








Lys 


Asn 


Tic 

lie 


Leu 


Asn 


Lys 


vai 


Leu 


Pin 

tjlU 


vai 


Pl -n 

oin 


Ser Arg 


Thr 


Asn 


Phe 




130 










135 










140 










lie 


Lys 


Arg 


lie 


T 1 A 

lie 


Tl A 

lie 


Leu 


Asp 


inr 


vai 


Ol n 

olU 


Asn 


He 


His 


Gly Cys 


145 










150 










155 










1 £■ A 

lb U 




Ser 


Leu 


Pro 


Asn 


Pne 


Tl £^ 

lie 


Ser 


Arg 


Tyr 


Ser 


Asp 


Gly 


Asn 


He 


TV 1 

Ala 










165 










1 / U 










175 




Asn 


ii^ne 


Lys 


Pro 


Leu 


U-i cj 


Fne 


Asp 


Pro 


T7a 1 

vai 


Pin 
ValU 


Gin 


Val 


Ala 


Ala 


T 1 

lie 








Ton 
lo U 










lob 










190 






Leu 


Cys 


Ser 


Ser 


(jiy 


Thr 


Thr 


vjiy 


Leu 


Pro 


Lys 


Gly Val 


Met 


Gin 


Thr 






195 










o n r* 










205 








xllS 


Cj±n 


Asn 


lie 


Cys 


vai 


Arg 


Leu 


Tl <a 

lie 


rilS 


Ala 


Leu Asp 


Pro Arg 


Tyr 




210 










2 15 










220 










Gly 


Thr 


Gin 


Leu 


lie 


Pro 


(aiy 


vai 


Thr 


vai 


Leu 


Val 


Tyr 


Leu 


Pro 


Phe 


225 










230 










235 










240 


Phe 


His 


Ala 


Phe 


Gly 


Phe 


His 


He 


Thr 


Leu 


Gly Tyr 


Phe 


Met 


Val 


Gly 










245 










250 










255 




Leu 


Arg 


Val 


T 1 ^ 

He 


iVIet 


Phe 


Arg 


Arg 


Pne 


Asp 


Gin 


Glu 


Ala 


Phe 


Leu 


Lys 








260 










265 










270 






Ala 


Tin 

lie 


Gin 


Asp 


Tyr 


tjlU 


vai 


Arg 


Ser 


vai 


He 


Asn 


Val 


Pro 


Ser 


Val 






275 










2 8 0 










285 








lie 


Leu 


Phe 


Leu 


Ser 


T T ri-l 

■f-'ys 


Ser 


Pro 


Leu 


vai 


Asp 


Lys 


Tyr 


Asp 


Leu 


Ser 




290 










295 










300 










Ser 


Leu 


Arg 


Glu 


Leu 


Cys 


Cys 


Gly 


A 1 -\ 

Ala 


Ala 


Pro 


Leu 


Ala 


Lys 


Glu 


Val 


305 










310 










315 










320 


Ala 


CjIU 


val 


Ala 


Ala 


Lys 


Arg 


Leu 


Asn 


Leu 


Pro Gly 


He 


Arg 


Cys 


Gly 










"3 O C 

325 










"3 A 
J O U 










335 




File 


\j±y 


Leu 


Thr 


olU 


Ser 


Tnr 


Ser 


H.Xci 




He 


Gin 


Thr 


Leu 


Gly Asp 








340 










-34b 










350 






QjrlU 


Fne 


Lys 


Ser 


oiy 


Ser 


Leu 




Arg 


vai 


Thr 


Pro 


Leu 


Met 


Ala 


Ala 






355 










3 60 










365 








Lys 


lie 


Ala 


ASp 


Arg 


tjrlU 


Thr 


Cjiy 


Lys 


A 1 

Ala 


Leu Gly 


Pro 


Asn 


Gin 


Val 




370 










375 










380 










vjiy 


tjlU 


Leu 


Cys 


Tl t=i 

lie 


Lys 


oiy 


Pro 


ineu 


vai 


Ser 


Lys 


Gly 


Tyr Val 


Asn 


385 










390 










395 










400 


Asn 


va± 


talU 


Ala 


Thr 


Lys 


r*i 11 
UlU 


Aia 


lie 


Asp 


Asp Asp Gly 


Trp 


Leu 


His 










405 










410 










415 




Ser 


Gly 


Asp 


Phe 


Gly 


Tyr 


Tyr 


Asp 


vjrlU 


Asp 


Glu 


His 


Phe 


Tyr 


Val 


Val 








420 










425 










430 






Asp 


Arg 


Tyr 


Lys 


Glu 


Leu 


T 1 r-y 

He 


Lys 


Tyr 


Lys 


Gly Ser Gin 


Val 


Ala 


Pro 






435 










440 










445 








Ala 


Glu 


Leu 


Glu 


Glu 


He 


Leu 


Leu 


Lys 


Asn 


Pro 


Cys 


He 


Arg Asp 


Val 




450 










455 










460 










Ala 


vai 


Val 


Lrly 


lie 


Pro 


Asp 


Leu 


tjlU 


A 1 

Ala 


Gly Glu 


Leu 


Pro 


Ser 


Ala 


465 










470 










475 










480 


Phe 


Val 


Val 


Lys 


Gin 


Pro 


Gly 


Thr 


Glu 


He 


Thr 


Ala 


Lys 


Glu 


Val 


Tyr 










485 










490 










495 




Asp 


Tyr 


Leu 


Ala 


Glu 


Arg 


Val 


Ser 


His 


Thr 


Lys 


Tyr 


Leu 


Arg Gly Gly 








500 










505 










510 







Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



Arg Asn Val Thr 

Leu Val Lys Ala 
540 



Gly Lys lie Thr 
525 , 
Gly Gly 
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